HuggingFaceFW/fineweb
Text GenerationENodc-by
HuggingFaceFW/fineweb is a text generation dataset in EN from HuggingFaceFW in Parquet format. It is distributed under the odc-by license and falls in the 10B<n<100B size category, and has been downloaded 255K times.
About HuggingFaceFW/fineweb
🍷 FineWeb
15 trillion tokens of the finest data the 🌐 web has to offer
What is it?
The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from C...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10B<n<100B
- Creator
- HuggingFaceFW
- Year
- 2026
- License
- odc-by
- Downloads
- 255018
- Likes
- 2906