Cognitive-Lab/NayanaOCR_Corpus_2025
Image To TextVisual Question AnsweringImage Text To TextAR, BN, DEcc-by-nc-4.0
Cognitive-Lab/NayanaOCR_Corpus_2025 is a image to text-focused dataset in AR, BN, DE that provides 1,006,170 labeled examples distributed in Parquet format. It is distributed under the cc-by-nc-4.0 license and falls in the 1M<n<10M size category, and has been downloaded 12.9K times.
About Cognitive-Lab/NayanaOCR_Corpus_2025
🪷 NayanaOCR Corpus 2025
A 1M-page, 22-language fully-parallel synthetic OCR + VQA corpus for document-centric vision-language models — every page rendered in every language.
NayanaOCR Corpus 2025 is one of the largest open-source multilingual...
Details
- Task
- Image To Text, Visual Question Answering, Image Text To Text
- Language
- AR, BN, DE
- Format
- Parquet
- Rows / instances
- 1006170
- Size
- 1M<n<10M
- Creator
- Cognitive-Lab
- Year
- 2026
- License
- cc-by-nc-4.0
- Downloads
- 12870
- Likes
- 15