m-a-p/Matrix
Text GenerationEN, ZH
M-a-p/Matrix is a text generation-focused dataset in EN, ZH distributed in Parquet format.
About m-a-p/Matrix
Matrix
An open-source pretraining dataset containing 4690 billion tokens, this bilingual dataset with both English and Chinese texts is used for training neo models.
Dataset Composition
The dataset consists of several components, eac...
Details
- Task
- Text Generation
- Language
- EN, ZH
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- m-a-p
- Year
- 2024