Skip to content

bigcode/the-stack-v2-train-smol-ids

Text GenerationCODE

Created by bigcode at 2024, the bigcode/the-stack-v2-train-smol-ids is a text generation dataset in CODE containing 40,138,809 records in Parquet format. With 2.1K downloads and 53 likes, it is actively used by the community. It is released under the other license and is a 10M<n<100M-scale dataset.

About bigcode/the-stack-v2-train-smol-ids

The Stack v2 The dataset consists of 4 versions: bigcode/the-stack-v2: the full "The Stack v2" dataset bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated bigcode/the-stack-v2-train-full-ids: ba...

Details

Task
Text Generation
Language
CODE
Format
Parquet
Rows / instances
40138809
Size
10M<n<100M
Creator
bigcode
Year
2024
License
other
Downloads
2060
Likes
53
Download Homepage

Related Text Generation datasets

FAQ