BLIP3o/BLIP3o-Pretrain-Long-Caption
General NLPEnglishapache-2.0
The BLIP3o/BLIP3o-Pretrain-Long-Caption dataset is a English General NLP resource from BLIP3o at 2025. With 16.3K downloads and 67 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 10M<n<100M-scale dataset.
About BLIP3o/BLIP3o-Pretrain-Long-Caption
BLIP3o Pretrain Long-Caption Dataset
This collection contains 27 million images, each paired with a long (~120 token) caption generated by Qwen/Qwen2.5-VL-7B-Instruct.
Download
from huggingface_hub import snapshot_download
snapshot_...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10M<n<100M
- Creator
- BLIP3o
- Year
- 2025
- License
- apache-2.0
- Downloads
- 16305
- Likes
- 67