Question 1

What is the PleIAs/Post-OCR-Correction dataset?

Accepted Answer

Post-OCR correction is a large corpus of 1 billion words containing original texts with a varying number of OCR mistakes and an experimental multilingual post-OCR correction output created by Pleias.
Generation of Post-OCR correction was performed...

Question 2

Is PleIAs/Post-OCR-Correction a benchmark?

Accepted Answer

PleIAs/Post-OCR-Correction is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download PleIAs/Post-OCR-Correction?

Accepted Answer

PleIAs/Post-OCR-Correction is available at its source: https://huggingface.co/datasets/PleIAs/Post-OCR-Correction.

PleIAs/Post-OCR-Correction

About PleIAs/Post-OCR-Correction

Details

Related General NLP datasets

FAQ