Skip to content

HuggingFaceTB/smol-smoltalk

General NLPENapache-2.0

HuggingFaceTB/smol-smoltalk is a General NLP dataset in EN from HuggingFaceTB with 484,570 records in Parquet format. It is distributed under the apache-2.0 license and falls in the 100K<n<1M size category, and has been downloaded 9.9K times.

About HuggingFaceTB/smol-smoltalk

Smol-SmalTalk This is a subset of SmolTalk dataset adapted for smol models with less than 1B parameters. We used it to build SmolLM2-360M-Instruct and SmolLM2-135M-Instruct. We do SFT on this dataset and then DPO on UltraFeedback. Compared to ...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
484570
Size
100K<n<1M
Creator
HuggingFaceTB
Year
2024
License
apache-2.0
Downloads
9903
Likes
104
Download Homepage

Related General NLP datasets

FAQ