bigcode/the-stack-v2-dedup
Text GenerationCODE
Bigcode/the-stack-v2-dedup is a text generation-focused dataset in CODE distributed in Parquet format. It is distributed under the other license and falls in the 1B<n<10B size category, and has been downloaded 19.5K times.
About bigcode/the-stack-v2-dedup
The Stack v2
The dataset consists of 4 versions:
bigcode/the-stack-v2: the full "The Stack v2" dataset
bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated <-- you are here
bigcode/the-stack-v2-t...
Details
- Task
- Text Generation
- Language
- CODE
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1B<n<10B
- Creator
- bigcode
- Year
- 2024
- License
- other
- Downloads
- 19522
- Likes
- 132