Skip to content

BAAI/SVIT

Visual Question AnsweringENcc-by-4.0

BAAI/SVIT is a visual question answering-focused dataset in EN distributed in Parquet format. It is distributed under the cc-by-4.0 license and falls in the 1M<n<10M size category, and has been downloaded 54 times.

About BAAI/SVIT

# Dataset Card for SVIT Scale up visual instruction tuning to millions by GPT-4. ## Dataset Description - **Repository:** https://github.com/BAAI-DCAI/Visual-Instruction-Tuning - **Paper:** https://arxiv.org/pdf/2307.04087.pdf ## Introduction We Scale up Visual Instruction Tuning (SVIT) by constructing a dataset of 4.2 million visual instruction tuning data including 1.6M conversation question-answer (QA) pairs, 1.6M complex reasoning QA pairs, 1.0M referring QA pairs and 106K detailed image description, by prompting GPT-4 with the abundant manual annotations of image. The structure of the repository: - **raw**: The folder contains the original images and annotations from Visual Genome and MS-COCO. - **data**: The folder contains the dataset in SVIT's original format. - **format/llava-v1.5**: We also provide the dataset in LLaVA-v1.5's format to better align with the community.…

Details

Task
Visual Question Answering
Language
EN
Format
Parquet
Rows / instances
N/A
Size
1M<n<10M
Creator
BAAI
Year
2023
License
cc-by-4.0
Downloads
54
Likes
33
Download Homepage

Related Visual Question Answering datasets

FAQ