lucius1022/DeMix_Corpora
General NLPEnglish
Created by lucius1022 at 2026, the lucius1022/DeMix_Corpora is a General NLP dataset in English in Parquet format.
About lucius1022/DeMix_Corpora
Dataset Card for DeMix Corpora
DeMix
š Paper: Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training
š¤ Dataset: DeMix Corpora
š± Github: Demix
Dataset Details
...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- lucius1022
- Year
- 2026