HuggingFaceFW/fineweb-edu
Text GenerationENodc-by
Created by HuggingFaceFW at 2026, the HuggingFaceFW/fineweb-edu is a text generation dataset in EN in Parquet format. With 383.5K downloads and 1.2K likes, it is actively used by the community. It is released under the odc-by license and is a 1B<n<10B-scale dataset.
About HuggingFaceFW/fineweb-edu
š FineWeb-Edu
1.3 trillion tokens of the finest educational data the š web has to offer
Paper: https://arxiv.org/abs/2406.17557
What is it?
š FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineW...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1B<n<10B
- Creator
- HuggingFaceFW
- Year
- 2026
- License
- odc-by
- Downloads
- 383463
- Likes
- 1165