HuggingFaceTB/stack-edu
General NLPEnglish
HuggingFaceTB/stack-edu is a General NLP dataset in English from HuggingFaceTB with 167,063,359 records in Parquet format. And falls in the 100M<n<1B size category, and has been downloaded 3.6K times.
About HuggingFaceTB/stack-edu
💻 Stack-Edu
Stack-Edu is a 125B token dataset of educational code filtered from The Stack v2, precisely the curated training corpus of StarCoder2 models denoted StarCoder2Data. It is intended for Language Models training.
This dataset was cura...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 167063359
- Size
- 100M<n<1B
- Creator
- HuggingFaceTB
- Year
- 2025
- Downloads
- 3632
- Likes
- 75