epfml/FineWeb-HQ
Text GenerationENodc-by
The epfml/FineWeb-HQ dataset is a EN text generation resource from epfml at 2025. With 112.3K downloads and 9 likes, it is actively used by the community. It is released under the odc-by license and is a 1B<n<10B-scale dataset.
About epfml/FineWeb-HQ
FineWeb-HQ
Dataset Summary
FineWeb-HQ is a high-quality, model-filtered pretraining dataset derived as a subset of FineWeb. FineWeb-HQ was created by selecting the top 10% of FineWeb documents based on a deep learning classifier trai...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1B<n<10B
- Creator
- epfml
- Year
- 2025
- License
- odc-by
- Downloads
- 112266
- Likes
- 9