PleIAs/Post-OCR-Correction
General NLPFR, EN, IT
The PleIAs/Post-OCR-Correction dataset is a FR, EN, IT General NLP resource from PleIAs at 2024.
About PleIAs/Post-OCR-Correction
Post-OCR correction is a large corpus of 1 billion words containing original texts with a varying number of OCR mistakes and an experimental multilingual post-OCR correction output created by Pleias.
Generation of Post-OCR correction was performed...
Details
- Task
- General NLP
- Language
- FR, EN, IT
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- PleIAs
- Year
- 2024