HuggingFaceTB/smoltalk2
General NLPEnglish
HuggingFaceTB/smoltalk2 is a General NLP-focused dataset in English that provides 8,610,022 labeled examples distributed in Parquet format. And falls in the 1M<n<10M size category, and has been downloaded 9.9K times.
About HuggingFaceTB/smoltalk2
SmolTalk2
Dataset description
This dataset contains three subsets (Mid, SFT, Preference) that correspond to the three phases of Post-Training for SmolLM3-3B. You can find more details in our blog post about how we used the data in e...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 8610022
- Size
- 1M<n<10M
- Creator
- HuggingFaceTB
- Year
- 2025
- Downloads
- 9925
- Likes
- 160