Skip to content

nvidia/Nemotron-Pretraining-Dataset-sample

General NLPEnglish

Nvidia/Nemotron-Pretraining-Dataset-sample is a General NLP dataset in English from nvidia in Parquet format.

About nvidia/Nemotron-Pretraining-Dataset-sample

Nemotron-Pre-Training-Dataset-v1 Release Data Overview This pretraining dataset, for generative AI model training, preserves high-value math and code while enriching it with diverse multilingual Q&A, fueling the next generation of in...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
nvidia
Year
2025
Download

Related General NLP datasets

FAQ