BAAI/SVIT
Visual Question AnsweringENcc-by-4.0
BAAI/SVIT is a visual question answering-focused dataset in EN distributed in Parquet format. It is distributed under the cc-by-4.0 license and falls in the 1M<n<10M size category, and has been downloaded 54 times.
About BAAI/SVIT
# Dataset Card for SVIT
Scale up visual instruction tuning to millions by GPT-4.
## Dataset Description
- **Repository:** https://github.com/BAAI-DCAI/Visual-Instruction-Tuning
- **Paper:** https://arxiv.org/pdf/2307.04087.pdf
## Introduction
We Scale up Visual Instruction Tuning (SVIT) by constructing a dataset of 4.2 million visual instruction tuning data including 1.6M conversation question-answer (QA) pairs, 1.6M complex reasoning QA pairs, 1.0M referring QA pairs and 106K detailed image description, by prompting GPT-4 with the abundant manual annotations of image.
The structure of the repository:
- **raw**: The folder contains the original images and annotations from Visual Genome and MS-COCO.
- **data**: The folder contains the dataset in SVIT's original format.
- **format/llava-v1.5**: We also provide the dataset in LLaVA-v1.5's format to better align with the community.…
Details
- Task
- Visual Question Answering
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1M<n<10M
- Creator
- BAAI
- Year
- 2023
- License
- cc-by-4.0
- Downloads
- 54
- Likes
- 33