mlfoundations/MINT-1T-HTML
Image To TextText GenerationEN
Mlfoundations/MINT-1T-HTML is a image to text-focused dataset in EN distributed in Parquet format.
About mlfoundations/MINT-1T-HTML
š MINT-1T:Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
š MINT-1T is an open-source Multimodal INTerleaved dataset with 1 trillion text tokens and 3.4 billion images, a 10x scale-up from existing op...
Details
- Task
- Image To Text, Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- mlfoundations
- Year
- 2026