Skip to content

lightonai/LightOnOCR-mix-0126

Image To TextEN, FR, DE

Lightonai/LightOnOCR-mix-0126 is a image to text-focused dataset in EN, FR, DE distributed in Parquet format.

About lightonai/LightOnOCR-mix-0126

LightOnOCR-mix-0126 LightOnOCR-mix-0126 is a large-scale OCR training dataset built via distillation: a strong vision–language model is prompted to produce naturally ordered full-page transcriptions (Markdown with LaTeX math spans and HTML tabl...

Details

Task
Image To Text
Language
EN, FR, DE
Format
Parquet
Rows / instances
N/A
Creator
lightonai
Year
2026
Download

Related Image To Text datasets

FAQ