Skip to content

bigcode/the-stack-v2-train-full-ids

Text GenerationCODE

The bigcode/the-stack-v2-train-full-ids dataset is a CODE text generation resource from bigcode at 2024 comprising 60,523,556 examples. With 365 downloads and 60 likes, it is actively used by the community. It is released under the other license and is a 10M<n<100M-scale dataset.

About bigcode/the-stack-v2-train-full-ids

The Stack v2 The dataset consists of 4 versions: bigcode/the-stack-v2: the full "The Stack v2" dataset bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated bigcode/the-stack-v2-train-full-ids: ba...

Details

Task
Text Generation
Language
CODE
Format
Parquet
Rows / instances
60523556
Size
10M<n<100M
Creator
bigcode
Year
2024
License
other
Downloads
365
Likes
60
Download Homepage

Related Text Generation datasets

FAQ