Skip to content

HuggingFaceTB/dclm-edu

General NLPENcc-by-4.0

HuggingFaceTB/dclm-edu is a General NLP dataset in EN from HuggingFaceTB in Parquet format. It is distributed under the cc-by-4.0 license and falls in the 1B<n<10B size category, and has been downloaded 7.3K times.

About HuggingFaceTB/dclm-edu

DCLM-Edu Description This is a filtered version of DCLM dataset using FineWeb-Edu educational quality classifier. We annotate each web page based on the educational quality on a scale from 0 to 5 and only keep samples with a score hi...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
N/A
Size
1B<n<10B
Creator
HuggingFaceTB
Year
2025
License
cc-by-4.0
Downloads
7345
Likes
38
Download Homepage

Related General NLP datasets

FAQ