EssentialAI/essential-web-v1.0
General NLPEnglishodc-by
EssentialAI/essential-web-v1.0 is a General NLP dataset in English from EssentialAI in Parquet format. It is distributed under the odc-by license and falls in the 10B<n<100B size category, and has been downloaded 345.2K times.
About EssentialAI/essential-web-v1.0
š Essential-Web: Complete 24-Trillion Token Dataset
š Website | š„ļø Code | š Paper | āļø AWS
š Dataset Description
Essential-Web is a 24-trillion-token web dataset with document-level metadata designed for flexible dataset ...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10B<n<100B
- Creator
- EssentialAI
- Year
- 2026
- License
- odc-by
- Downloads
- 345191
- Likes
- 227