Skip to content

bevaya/pubmed-ocr

Image To TextImage Text To TextEN

Created by bevaya at 2026, the bevaya/pubmed-ocr is a image to text dataset in EN in Parquet format. With 2.2K downloads and 71 likes, it is actively used by the community. It is released under the other license and is a 1M<n<10M-scale dataset.

About bevaya/pubmed-ocr

PubMed-OCR: PMC Open Access OCR Annotations PubMed-OCR is an OCR-centric corpus of scientific articles derived from PubMed Central Open Access PDFs. Each page is rendered to an image and annotated with Google Cloud Vision OCR, released in a com...

Details

Task
Image To Text, Image Text To Text
Language
EN
Format
Parquet
Rows / instances
N/A
Size
1M<n<10M
Creator
bevaya
Year
2026
License
other
Downloads
2241
Likes
71
Download Homepage

FAQ