codeparrot/github-code
Text GenerationCODE
The codeparrot/github-code dataset is a CODE text generation resource from codeparrot at 2022. With 633.1K downloads and 367 likes, it is actively used by the community. It is released under the other license.
About codeparrot/github-code
The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.
Details
- Task
- Text Generation
- Language
- CODE
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- codeparrot
- Year
- 2022
- License
- other
- Downloads
- 633092
- Likes
- 367