Skip to content

Spawning/PD12M

General NLPEN

Spawning/PD12M is a General NLP dataset in EN from Spawning in Parquet format.

About Spawning/PD12M

PD12M Summary At 12.4 million image-caption pairs, PD12M is the largest public domain image-text dataset to date, with sufficient size to train foundation models while minimizing copyright concerns. Through the Source.Plus platform,...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
Spawning
Year
2024
Download

Related General NLP datasets

FAQ