vidore/colpali_train_set
Document Question AnsweringVisual Document RetrievalEnglish
Vidore/colpali_train_set is a document question answering dataset in English from vidore in Parquet format.
About vidore/colpali_train_set
Dataset Description
This dataset is the training set of ColPali it includes 127,460 query-image pairs from both openly available academic datasets (63%) and a synthetic dataset made up
of pages from web-crawled PDF documents and augmented with...
Details
- Task
- Document Question Answering, Visual Document Retrieval
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- vidore
- Year
- 2024