nvidia/Nemotron-CC-v2
Text GenerationEnglish
Nvidia/Nemotron-CC-v2 is a text generation-focused dataset in English distributed in Parquet format.
About nvidia/Nemotron-CC-v2
Nemotron-Pre-Training-Dataset-v1 Release
Data Overview
This pretraining dataset, for generative AI model training, preserves high-value math and code while enriching it with diverse multilingual Q&A, fueling the next generation of in...
Details
- Task
- Text Generation
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- nvidia
- Year
- 2025