Skip to content

bigcode/the-stack-v2-dedup

Text GenerationCODE

Bigcode/the-stack-v2-dedup is a text generation-focused dataset in CODE distributed in Parquet format. It is distributed under the other license and falls in the 1B<n<10B size category, and has been downloaded 19.5K times.

About bigcode/the-stack-v2-dedup

The Stack v2 The dataset consists of 4 versions: bigcode/the-stack-v2: the full "The Stack v2" dataset bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated <-- you are here bigcode/the-stack-v2-t...

Details

Task
Text Generation
Language
CODE
Format
Parquet
Rows / instances
N/A
Size
1B<n<10B
Creator
bigcode
Year
2024
License
other
Downloads
19522
Likes
132
Download Homepage

Related Text Generation datasets

FAQ