deepcs233/Visual-CoT
Image Text To TextENapache-2.0
Deepcs233/Visual-CoT is a image text to text-focused dataset in EN distributed in Parquet format. It is distributed under the apache-2.0 license, and has been downloaded 2.5K times.
About deepcs233/Visual-CoT
VisCoT Dataset Card
There is a shortage of multimodal datasets for training multi-modal large language models (MLLMs) that require to identify specific regions in an image for additional attention to improve response performance. This type of...
Details
- Task
- Image Text To Text
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- deepcs233
- Year
- 2024
- License
- apache-2.0
- Downloads
- 2471
- Likes
- 63