Skip to content

bigcode/the-stack-v2

Text GenerationCODE

The bigcode/the-stack-v2 dataset is a CODE text generation resource from bigcode at 2024. With 29.1K downloads and 590 likes, it is actively used by the community. It is released under the other license and is a 1B<n<10B-scale dataset.

About bigcode/the-stack-v2

The Stack v2 The dataset consists of 4 versions: bigcode/the-stack-v2: the full "The Stack v2" dataset <-- you are here bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated bigcode/the-stack-v2-tr...

Details

Task
Text Generation
Language
CODE
Format
Parquet
Rows / instances
N/A
Size
1B<n<10B
Creator
bigcode
Year
2024
License
other
Downloads
29055
Likes
590
Download Homepage

Related Text Generation datasets

FAQ