Skip to content

HuggingFaceTB/smoltalk2

General NLPEnglish

HuggingFaceTB/smoltalk2 is a General NLP-focused dataset in English that provides 8,610,022 labeled examples distributed in Parquet format. And falls in the 1M<n<10M size category, and has been downloaded 9.9K times.

About HuggingFaceTB/smoltalk2

SmolTalk2 Dataset description This dataset contains three subsets (Mid, SFT, Preference) that correspond to the three phases of Post-Training for SmolLM3-3B. You can find more details in our blog post about how we used the data in e...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
8610022
Size
1M<n<10M
Creator
HuggingFaceTB
Year
2025
Downloads
9925
Likes
160
Download Homepage

Related General NLP datasets

FAQ