Skip to content

hasankursun/github-code-2025-language-split

General NLPEnglish

Created by hasankursun at 2025, the hasankursun/github-code-2025-language-split is a General NLP dataset in English in Parquet format. With 15.8K downloads and 10 likes, it is actively used by the community. It is released under the other license and is a 100M<n<1B-scale dataset.

About hasankursun/github-code-2025-language-split

📜 Source Data & Attribution This dataset is a processed derivative of nick007x/github-code-2025. Origination The original data was aggregated by nick007x from public GitHub repositories. We have retained the original content, fil...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
hasankursun
Year
2025
License
other
Downloads
15819
Likes
10
Download Homepage

Related General NLP datasets

FAQ