nvidia/Nemotron-Pretraining-Dataset-sample
General NLPEnglish
Nvidia/Nemotron-Pretraining-Dataset-sample is a General NLP dataset in English from nvidia in Parquet format.
About nvidia/Nemotron-Pretraining-Dataset-sample
Nemotron-Pre-Training-Dataset-v1 Release
Data Overview
This pretraining dataset, for generative AI model training, preserves high-value math and code while enriching it with diverse multilingual Q&A, fueling the next generation of in...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- nvidia
- Year
- 2025