Skip to content

m-a-p/MAP-CC

General NLPEnglish

M-a-p/MAP-CC is a General NLP dataset in English from m-a-p in Parquet format.

About m-a-p/MAP-CC

MAP-CC 🌐 Homepage | šŸ¤— MAP-CC | šŸ¤— CHC-Bench | šŸ¤— CT-LLM | šŸ“– arXiv | GitHub An open-source Chinese pretraining dataset with a scale of 800 billion tokens, offering the NLP community high-quality Chinese pretraining data. Disclaimer Th...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
m-a-p
Year
2024
Download

Related General NLP datasets

FAQ