HuggingFaceTB/dclm-edu
General NLPENcc-by-4.0
HuggingFaceTB/dclm-edu is a General NLP dataset in EN from HuggingFaceTB in Parquet format. It is distributed under the cc-by-4.0 license and falls in the 1B<n<10B size category, and has been downloaded 7.3K times.
About HuggingFaceTB/dclm-edu
DCLM-Edu
Description
This is a filtered version of DCLM dataset using FineWeb-Edu educational quality classifier. We annotate each web page based on the educational quality
on a scale from 0 to 5 and only keep samples with a score hi...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1B<n<10B
- Creator
- HuggingFaceTB
- Year
- 2025
- License
- cc-by-4.0
- Downloads
- 7345
- Likes
- 38