Skip to content

m-a-p/Matrix

Text GenerationEN, ZH

M-a-p/Matrix is a text generation-focused dataset in EN, ZH distributed in Parquet format.

About m-a-p/Matrix

Matrix An open-source pretraining dataset containing 4690 billion tokens, this bilingual dataset with both English and Chinese texts is used for training neo models. Dataset Composition The dataset consists of several components, eac...

Details

Task
Text Generation
Language
EN, ZH
Format
Parquet
Rows / instances
N/A
Creator
m-a-p
Year
2024
Download

Related Text Generation datasets

FAQ