bigcode/the-stack-v2-train-full-ids
Text GenerationCODE
The bigcode/the-stack-v2-train-full-ids dataset is a CODE text generation resource from bigcode at 2024 comprising 60,523,556 examples. With 365 downloads and 60 likes, it is actively used by the community. It is released under the other license and is a 10M<n<100M-scale dataset.
About bigcode/the-stack-v2-train-full-ids
The Stack v2
The dataset consists of 4 versions:
bigcode/the-stack-v2: the full "The Stack v2" dataset
bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated
bigcode/the-stack-v2-train-full-ids: ba...
Details
- Task
- Text Generation
- Language
- CODE
- Format
- Parquet
- Rows / instances
- 60523556
- Size
- 10M<n<100M
- Creator
- bigcode
- Year
- 2024
- License
- other
- Downloads
- 365
- Likes
- 60