Skip to content

PleIAs/Post-OCR-Correction

General NLPFR, EN, IT

The PleIAs/Post-OCR-Correction dataset is a FR, EN, IT General NLP resource from PleIAs at 2024.

About PleIAs/Post-OCR-Correction

Post-OCR correction is a large corpus of 1 billion words containing original texts with a varying number of OCR mistakes and an experimental multilingual post-OCR correction output created by Pleias. Generation of Post-OCR correction was performed...

Details

Task
General NLP
Language
FR, EN, IT
Format
Parquet
Rows / instances
N/A
Creator
PleIAs
Year
2024
Download

Related General NLP datasets

FAQ