Skip to content

ajibawa-2023/Java-Code-Large

Text GenerationENmit

Ajibawa-2023/Java-Code-Large is a text generation dataset in EN from ajibawa-2023 in Parquet format. It is distributed under the mit license and falls in the 10M<n<100M size category, and has been downloaded 1.2K times.

About ajibawa-2023/Java-Code-Large

Java-Code-Large Java-Code-Large is a large-scale corpus of publicly available Java source code comprising more than 15 million java codes. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, so...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Size
10M<n<100M
Creator
ajibawa-2023
Year
2026
License
mit
Downloads
1213
Likes
32
Download Homepage

Related Text Generation datasets

FAQ