allenai/dolma3_longmino_mix-50B-1025
General NLPENodc-by
Created by allenai at 2025, the allenai/dolma3_longmino_mix-50B-1025 is a General NLP dataset in EN in Parquet format. With 33K downloads and 10 likes, it is actively used by the community. It is released under the odc-by license.
About allenai/dolma3_longmino_mix-50B-1025
Dolma 3 Longmino Mix (50B)
The Dolma 3 Longmino Mix (50B) is the mixture of data used for the third stage of training for Olmo 3 7B model.
Dataset Sources
Source
Type
Tokens
Docs
LC-s2pdf-REX 32k-64k
Synth PDFs
6.08B (12....
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- allenai
- Year
- 2025
- License
- odc-by
- Downloads
- 32957
- Likes
- 10