lightonai/LightOnOCR-mix-0126
Image To TextEN, FR, DE
Lightonai/LightOnOCR-mix-0126 is a image to text-focused dataset in EN, FR, DE distributed in Parquet format.
About lightonai/LightOnOCR-mix-0126
LightOnOCR-mix-0126
LightOnOCR-mix-0126 is a large-scale OCR training dataset built via distillation: a strong vision–language model is prompted to produce naturally ordered full-page transcriptions (Markdown with LaTeX math spans and HTML tabl...
Details
- Task
- Image To Text
- Language
- EN, FR, DE
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- lightonai
- Year
- 2026