Skip to content

OptimalScale/ClimbMix

Text GenerationEN

OptimalScale/ClimbMix is a text generation-focused dataset in EN distributed in Parquet format.

About OptimalScale/ClimbMix

ClimbMix is a high-quality pre-training corpus released by NVIDIA. Here is the description: ClimbMix is a compact yet powerful 400-billion-token dataset designed for efficient pre-training that delivers superior performance under an equal token b...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
OptimalScale
Year
2025
Download

Related Text Generation datasets

FAQ