epfml/FineWeb2-HQ
Text GenerationRU, ZH, DEodc-by
Created by epfml at 2025, the epfml/FineWeb2-HQ is a text generation dataset in RU, ZH, DE in Parquet format. With 24.7K downloads and 69 likes, it is actively used by the community. It is released under the odc-by license and is a 100M<n<1B-scale dataset.
About epfml/FineWeb2-HQ
FineWeb2-HQ
Dataset summary
FineWeb2-HQ is a high-quality, model-filtered pretraining dataset derived as a subset of FineWeb2, spanning 20 languages. It enables around 6x faster pretraining compared to the base dataset. FineWeb2-HQ w...
Details
- Task
- Text Generation
- Language
- RU, ZH, DE
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- epfml
- Year
- 2025
- License
- odc-by
- Downloads
- 24726
- Likes
- 69