Skip to content

HuggingFaceFW/fineweb-2

Text GenerationAAI, AAK, AAUodc-by

The HuggingFaceFW/fineweb-2 dataset is a AAI, AAK, AAU text generation resource from HuggingFaceFW at 2024. With 99.1K downloads and 827 likes, it is actively used by the community. It is released under the odc-by license and is a 1B<n<10B-scale dataset.

About HuggingFaceFW/fineweb-2

🥂 FineWeb2 A sparkling update with 1000s of languages What is it? This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages. The 🥂 FineWeb2 dataset is fu...

Details

Task
Text Generation
Language
AAI, AAK, AAU
Format
Parquet
Rows / instances
N/A
Size
1B<n<10B
Creator
HuggingFaceFW
Year
2024
License
odc-by
Downloads
99062
Likes
827
Download Homepage

Related Text Generation datasets

FAQ