Skip to content

BLIP3o/BLIP3o-Pretrain-Long-Caption

General NLPEnglishapache-2.0

The BLIP3o/BLIP3o-Pretrain-Long-Caption dataset is a English General NLP resource from BLIP3o at 2025. With 16.3K downloads and 67 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 10M<n<100M-scale dataset.

About BLIP3o/BLIP3o-Pretrain-Long-Caption

BLIP3o Pretrain Long-Caption Dataset This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct. Download from huggingface_hub import snapshot_download snapshot_...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
10M<n<100M
Creator
BLIP3o
Year
2025
License
apache-2.0
Downloads
16305
Likes
67
Download Homepage

Related General NLP datasets

FAQ