m-a-p/PIN-14M
General NLPEN, ZH
The m-a-p/PIN-14M dataset is a EN, ZH General NLP resource from m-a-p at 2024.
About m-a-p/PIN-14M
PIN-14M
A mini version of "PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents"
Paper: https://arxiv.org/abs/2406.13923
This dataset contains 14M samples in PIN format, with around 18.79 TB storage.
š News
[ 2025....
Details
- Task
- General NLP
- Language
- EN, ZH
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- m-a-p
- Year
- 2024