OptimalScale/ClimbMix
Text GenerationEN
OptimalScale/ClimbMix is a text generation-focused dataset in EN distributed in Parquet format.
About OptimalScale/ClimbMix
ClimbMix is a high-quality pre-training corpus released by NVIDIA. Here is the description:
ClimbMix is a compact yet powerful 400-billion-token dataset designed for efficient pre-training that delivers superior performance under an equal token b...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- OptimalScale
- Year
- 2025