Skip to content

nvidia/OpenCodeInstruct

Text GenerationEN

Created by nvidia at 2025, the nvidia/OpenCodeInstruct is a text generation dataset in EN in Parquet format.

About nvidia/OpenCodeInstruct

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs Dataset Description We introduce OpenCodeInstruct, the largest open-access instruction tuning dataset, comprising 5 million diverse samples. OpenCodeInstruct is...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
nvidia
Year
2025
Download

Related Text Generation datasets

FAQ