Skip to content

Token Classification Datasets

There are 5 token classification datasets in our directory. Each links to its source, paper, and download — browse the full list below or filter by language.

Token Classification is the task of labelling individual tokens in a sequence, used for tasks like part-of-speech tagging. We catalog 5 datasets for it.

Updated June 2026

What languages do token classification datasets cover?

Explore other dataset tasks

Frequently asked questions