Skip to content

vidore/colpali_train_set

Document Question AnsweringVisual Document RetrievalEnglish

Vidore/colpali_train_set is a document question answering dataset in English from vidore in Parquet format.

About vidore/colpali_train_set

Dataset Description This dataset is the training set of ColPali it includes 127,460 query-image pairs from both openly available academic datasets (63%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with...

Details

Task
Document Question Answering, Visual Document Retrieval
Language
English
Format
Parquet
Rows / instances
N/A
Creator
vidore
Year
2024
Download

FAQ