Skip to content

HuggingFaceM4/WebSight

General NLPENcc-by-4.0

Created by HuggingFaceM4 at 2024, the HuggingFaceM4/WebSight is a General NLP dataset in EN containing 2,745,658 records in Parquet format. With 9.3K downloads and 395 likes, it is actively used by the community. It is released under the cc-by-4.0 license and is a 1M<n<10M-scale dataset.

About HuggingFaceM4/WebSight

Dataset Card for WebSight Dataset Description WebSight is a large synthetic dataset containing HTML/CSS codes representing synthetically generated English websites, each accompanied by a corresponding screenshot. This dataset serves ...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
2745658
Size
1M<n<10M
Creator
HuggingFaceM4
Year
2024
License
cc-by-4.0
Downloads
9255
Likes
395
Download Homepage

Related General NLP datasets

FAQ