Skip to content

nvidia/Nemotron-CC-v2

Text GenerationEnglish

Nvidia/Nemotron-CC-v2 is a text generation-focused dataset in English distributed in Parquet format.

About nvidia/Nemotron-CC-v2

Nemotron-Pre-Training-Dataset-v1 Release Data Overview This pretraining dataset, for generative AI model training, preserves high-value math and code while enriching it with diverse multilingual Q&A, fueling the next generation of in...

Details

Task
Text Generation
Language
English
Format
Parquet
Rows / instances
N/A
Creator
nvidia
Year
2025
Download

Related Text Generation datasets

FAQ