Skip to content

HuggingFaceFW/fineweb-edu

Text GenerationENodc-by

Created by HuggingFaceFW at 2026, the HuggingFaceFW/fineweb-edu is a text generation dataset in EN in Parquet format. With 383.5K downloads and 1.2K likes, it is actively used by the community. It is released under the odc-by license and is a 1B<n<10B-scale dataset.

About HuggingFaceFW/fineweb-edu

šŸ“š FineWeb-Edu 1.3 trillion tokens of the finest educational data the 🌐 web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? šŸ“š FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineW...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Size
1B<n<10B
Creator
HuggingFaceFW
Year
2026
License
odc-by
Downloads
383463
Likes
1165
Download Homepage

Related Text Generation datasets

FAQ