commoncrawl/CommonLID
Text ClassificationACE, ACF, AEB
The commoncrawl/CommonLID dataset is a ACE, ACF, AEB text classification resource from commoncrawl at 2026. With 151 downloads and 53 likes, it is actively used by the community. It is released under the other license and is a 100K<n<1M-scale dataset.
About commoncrawl/CommonLID
CommonLID
CommonLID is a community-created language identification (LID) benchmark. CommonLID consists of web text manually annotated for the language that it is written in. CommonLID contains annotations for 109 languages, where 78 of those la...
Details
- Task
- Text Classification
- Language
- ACE, ACF, AEB
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100K<n<1M
- Creator
- commoncrawl
- Year
- 2026
- License
- other
- Downloads
- 151
- Likes
- 53