Skip to content

EssentialAI/essential-web-v1.0

General NLPEnglishodc-by

EssentialAI/essential-web-v1.0 is a General NLP dataset in English from EssentialAI in Parquet format. It is distributed under the odc-by license and falls in the 10B<n<100B size category, and has been downloaded 345.2K times.

About EssentialAI/essential-web-v1.0

🌐 Essential-Web: Complete 24-Trillion Token Dataset šŸ† Website | šŸ–„ļø Code | šŸ“– Paper | ā˜ļø AWS šŸ“‹ Dataset Description Essential-Web is a 24-trillion-token web dataset with document-level metadata designed for flexible dataset ...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
10B<n<100B
Creator
EssentialAI
Year
2026
License
odc-by
Downloads
345191
Likes
227
Download Homepage

Related General NLP datasets

FAQ