German Datasets
We catalog 12 German datasets for NLP and machine learning, including 1 benchmarks. Browse the list below or narrow down by task.
This page covers German, a high-resource European language widely used in NLP benchmarks. Our directory includes 12 datasets in German.
Updated June 2026
- clips/mfaqQuestion AnsweringCS, DA, DE
- Argumentation Annotated Student Peer Reviews Corpus (AASPRC)Argument Component Classification, Argument Relation ClassificationGerman
- CC100-GermanText CorporaGerman
- coral-nlp/german-commonsText GenerationDE
- Ten Thousand German News Articles Dataset (10kGNAD)ClassificationGerman
- LibriVoxDeEnSpeech Translation, Machine TranslationGerman, English
- Named Entity Model for German, Politics (NEMGP)Named Entity Recognition (NER)German
- Conference on Computational Natural Language Learning (CoNLL 2003)Named Entity Recognition (NER), Part-of-Speech (POS)German, English
- Wikidata NE datasetNamed Entity Recognition, Knowledge BaseGerman, English
- deepset/germanquadQuestion Answering, Text RetrievalDE
- ParCorFullMachine Translation, Coreference ResolutionGerman, EnglishBenchmark
- papluca/language-identificationText ClassificationAR, BG, DE
What tasks do German datasets cover?
Question Answering (2)Machine Translation (2)Named Entity Recognition (NER) (2)Argument Component Classification (1)Argument Relation Classification (1)Text Corpora (1)Text Generation (1)Classification (1)Speech Translation (1)Part-of-Speech (POS) (1)Named Entity Recognition (1)Knowledge Base (1)Text Retrieval (1)Coreference Resolution (1)Text Classification (1)