Question 1

What is the papluca/language-identification dataset?

Accepted Answer

Dataset Card for Language Identification dataset

Dataset Summary

The Language Identification dataset is a collection of 90k samples consisting of text passages and corresponding language label. 
This dataset was created by collecting...

Question 2

Is papluca/language-identification a benchmark?

Accepted Answer

papluca/language-identification is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download papluca/language-identification?

Accepted Answer

papluca/language-identification is available at its source: https://huggingface.co/datasets/papluca/language-identification.

papluca/language-identification

About papluca/language-identification

Details

Related Text Classification datasets

FAQ