allenai/dolmino-mix-1124
Text GenerationENodc-by
The allenai/dolmino-mix-1124 dataset is a EN text generation resource from allenai at 2024. With 11.1K downloads and 97 likes, it is actively used by the community. It is released under the odc-by license and is a 100M<n<1B-scale dataset.
About allenai/dolmino-mix-1124
DOLMino dataset mix for OLMo2 stage 2 annealing training.
Mixture of high-quality data used for the second stage of OLMo2 training.
Source Sizes
Name
Category
Tokens
Bytes (uncompressed)
Documents
License
DCLM
HQ Web Page...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- allenai
- Year
- 2024
- License
- odc-by
- Downloads
- 11067
- Likes
- 97