nvidia/Nemotron-ClimbMix
Text GenerationEN
Nvidia/Nemotron-ClimbMix is a text generation-focused dataset in EN distributed in Parquet format.
About nvidia/Nemotron-ClimbMix
ClimbMix Dataset
š Creating the highest-quality pre-training datasets for LLMs š
š PAPER
š¤ CLIMBLAB
š¤ CLIMBMIX
š HOMEPAGE
Figure 1: Continuously training a 1B model yields a 2.0% imp...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- nvidia
- Year
- 2025