Skip to content

HuggingFaceTB/smoltalk

General NLPEN

HuggingFaceTB/smoltalk is a General NLP dataset in EN from HuggingFaceTB in Parquet format. And falls in the 1M<n<10M size category, and has been downloaded 18.1K times.

About HuggingFaceTB/smoltalk

SmolTalk Dataset description This is a synthetic dataset designed for supervised finetuning (SFT) of LLMs. It was used to build SmolLM2-Instruct family of models and contains 1M samples. More details in our paper https://arxiv.org/a...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
N/A
Size
1M<n<10M
Creator
HuggingFaceTB
Year
2024
Downloads
18134
Likes
415
Download Homepage

Related General NLP datasets

FAQ