Skip to content

HuggingFaceTB/stack-edu

General NLPEnglish

HuggingFaceTB/stack-edu is a General NLP dataset in English from HuggingFaceTB with 167,063,359 records in Parquet format. And falls in the 100M<n<1B size category, and has been downloaded 3.6K times.

About HuggingFaceTB/stack-edu

💻 Stack-Edu Stack-Edu is a 125B token dataset of educational code filtered from The Stack v2, precisely the curated training corpus of StarCoder2 models denoted StarCoder2Data. It is intended for Language Models training. This dataset was cura...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
167063359
Size
100M<n<1B
Creator
HuggingFaceTB
Year
2025
Downloads
3632
Likes
75
Download Homepage

Related General NLP datasets

FAQ