Skip to content

bigcode/the-stack-smol

Text GenerationCODE

Bigcode/the-stack-smol is a text generation-focused dataset in CODE distributed in Parquet format. And falls in the 100K<n<1M size category, and has been downloaded 12.4K times.

About bigcode/the-stack-smol

Dataset Description A small subset (~0.1%) of the-stack dataset, each programming language has 10,000 random samples from the original dataset. The dataset has 2.6GB of text (code). Languages The dataset contains 30 programming ...

Details

Task
Text Generation
Language
CODE
Format
Parquet
Rows / instances
N/A
Size
100K<n<1M
Creator
bigcode
Year
2022
Downloads
12390
Likes
84
Download Homepage

Related Text Generation datasets

FAQ