Skip to content

epfml/FineWeb-HQ

Text GenerationENodc-by

The epfml/FineWeb-HQ dataset is a EN text generation resource from epfml at 2025. With 112.3K downloads and 9 likes, it is actively used by the community. It is released under the odc-by license and is a 1B<n<10B-scale dataset.

About epfml/FineWeb-HQ

FineWeb-HQ Dataset Summary FineWeb-HQ is a high-quality, model-filtered pretraining dataset derived as a subset of FineWeb. FineWeb-HQ was created by selecting the top 10% of FineWeb documents based on a deep learning classifier trai...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Size
1B<n<10B
Creator
epfml
Year
2025
License
odc-by
Downloads
112266
Likes
9
Download Homepage

Related Text Generation datasets

FAQ