papluca/language-identification
Text ClassificationAR, BG, DE
Papluca/language-identification is a text classification-focused dataset in AR, BG, DE distributed in Parquet format.
About papluca/language-identification
Dataset Card for Language Identification dataset
Dataset Summary
The Language Identification dataset is a collection of 90k samples consisting of text passages and corresponding language label.
This dataset was created by collecting...
Details
- Task
- Text Classification
- Language
- AR, BG, DE
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- papluca
- Year
- 2022