Skip to content

HuggingFaceFW/fineweb

Text GenerationENodc-by

HuggingFaceFW/fineweb is a text generation dataset in EN from HuggingFaceFW in Parquet format. It is distributed under the odc-by license and falls in the 10B<n<100B size category, and has been downloaded 255K times.

About HuggingFaceFW/fineweb

🍷 FineWeb 15 trillion tokens of the finest data the 🌐 web has to offer What is it? The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from C...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Size
10B<n<100B
Creator
HuggingFaceFW
Year
2026
License
odc-by
Downloads
255018
Likes
2906
Download Homepage

Related Text Generation datasets

FAQ