Skip to content

mlfoundations/MINT-1T-PDF-CC-2023-40

Image To TextText GenerationEN

The mlfoundations/MINT-1T-PDF-CC-2023-40 dataset is a EN image to text resource from mlfoundations at 2024.

About mlfoundations/MINT-1T-PDF-CC-2023-40

šŸƒ MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens šŸƒ MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing open-...

Details

Task
Image To Text, Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
mlfoundations
Year
2024
Download

Related Image To Text, Text Generation datasets

FAQ