bigcode/the-stack-v2-train-smol-ids
Text GenerationCODE
Created by bigcode at 2024, the bigcode/the-stack-v2-train-smol-ids is a text generation dataset in CODE containing 40,138,809 records in Parquet format. With 2.1K downloads and 53 likes, it is actively used by the community. It is released under the other license and is a 10M<n<100M-scale dataset.
About bigcode/the-stack-v2-train-smol-ids
The Stack v2
The dataset consists of 4 versions:
bigcode/the-stack-v2: the full "The Stack v2" dataset
bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated
bigcode/the-stack-v2-train-full-ids: ba...
Details
- Task
- Text Generation
- Language
- CODE
- Format
- Parquet
- Rows / instances
- 40138809
- Size
- 10M<n<100M
- Creator
- bigcode
- Year
- 2024
- License
- other
- Downloads
- 2060
- Likes
- 53