Information Extraction Datasets
There are 15 information extraction datasets in our directory. Each links to its source, paper, and download — browse the full list below or filter by language.
Information Extraction is a machine-learning task covered in our directory. We catalog 15 datasets for it.
Updated June 2026
- PerKeyKeyphrase Extraction, Information ExtractionPersian
- BioCreative II Gene Mention Recognition (BC2GM)Information Extraction, Named Entity Recognition (NER)English
- BC5CDR Drug/Chemical (BC5-Chem)Information Extraction, Named Entity Recognition (NER)English
- BC5CDR Disease (BC5-Disease)Information Extraction, Named Entity Recognition (NER)English
- JNLPBAInformation Extraction, Named Entity Recognition (NER)English
- NCBI Disease CorpusInformation Extraction, Named Entity Recognition (NER)English
- Adverse Drug Effect (ADE) CorpusInformation ExtractionEnglish
- DNA Methylation CorpusInformation Extraction, Entity Extraction, Event ExtractionEnglish
- Exhaustive PTM CorpusInformation Extraction, Event ExtractionEnglish
- mTOR Pathway CorpusInformation Extraction, Entity Extraction, Event ExtractionEnglish
- PTM Event CorpusInformation Extraction, Event ExtractionEnglish
- T4SS Event CorpusInformation Extraction, Event ExtractionEnglish
- The New York Times Annotated CorpusSummarization, Information ExtractionEnglish
- QudaInformation Extraction, VisualizationEnglish
- An Open Information Extraction Corpus (OPIEC)Knowledge Base, Information ExtractionEnglish