Skip to content

m-a-p/PIN-14M

General NLPEN, ZH

The m-a-p/PIN-14M dataset is a EN, ZH General NLP resource from m-a-p at 2024.

About m-a-p/PIN-14M

PIN-14M A mini version of "PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents" Paper: https://arxiv.org/abs/2406.13923 This dataset contains 14M samples in PIN format, with around 18.79 TB storage. šŸš€ News [ 2025....

Details

Task
General NLP
Language
EN, ZH
Format
Parquet
Rows / instances
N/A
Creator
m-a-p
Year
2024
Download

Related General NLP datasets

FAQ