Skip to content

pixparse/pdfa-eng-wds

Image To TextEN

Pixparse/pdfa-eng-wds is a image to text-focused dataset in EN distributed in Parquet format.

About pixparse/pdfa-eng-wds

Dataset Card for PDF Association dataset (PDFA) Dataset Summary PDFA dataset is a document dataset filtered from the SafeDocs corpus, aka CC-MAIN-2021-31-PDF-UNTRUNCATED. The original purpose of that corpus is for comprehensive pdf d...

Details

Task
Image To Text
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
pixparse
Year
2024
Download

Related Image To Text datasets

FAQ