OpenGVLab/OmniCorpus-CC-210M
Image To TextVisual Question AnsweringEN
OpenGVLab/OmniCorpus-CC-210M is a image to text dataset in EN from OpenGVLab in Parquet format.
About OpenGVLab/OmniCorpus-CC-210M
🐳 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
This repository contains 210 million image-text interleaved documents filtered from the OmniCorpus-CC dataset, which was sourced from Common Crawl.
Repo...
Details
- Task
- Image To Text, Visual Question Answering
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- OpenGVLab
- Year
- 2024