Skip to content

epfml/FineWeb2-HQ

Text GenerationRU, ZH, DEodc-by

Created by epfml at 2025, the epfml/FineWeb2-HQ is a text generation dataset in RU, ZH, DE in Parquet format. With 24.7K downloads and 69 likes, it is actively used by the community. It is released under the odc-by license and is a 100M<n<1B-scale dataset.

About epfml/FineWeb2-HQ

FineWeb2-HQ Dataset summary FineWeb2-HQ is a high-quality, model-filtered pretraining dataset derived as a subset of FineWeb2, spanning 20 languages. It enables around 6x faster pretraining compared to the base dataset. FineWeb2-HQ w...

Details

Task
Text Generation
Language
RU, ZH, DE
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
epfml
Year
2025
License
odc-by
Downloads
24726
Likes
69
Download Homepage

Related Text Generation datasets

FAQ