community-datasets/tapaco
TranslationText ClassificationAF, AR, AZcc-by-2.0
Community-datasets/tapaco is a translation-focused dataset in AF, AR, AZ that provides 3,852,384 labeled examples distributed in Parquet format. It is distributed under the cc-by-2.0 license and falls in the 1M<n<10M size category, and has been downloaded 596 times.
About community-datasets/tapaco
Dataset Card for TaPaCo Corpus
Dataset Summary
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database.
Tatoeba is a crowdsourcing project mainly geared towards language learners. Its aim is to prov...
Details
- Task
- Translation, Text Classification
- Language
- AF, AR, AZ
- Format
- Parquet
- Rows / instances
- 3852384
- Size
- 1M<n<10M
- Creator
- community-datasets
- Year
- 2022
- License
- cc-by-2.0
- Downloads
- 596
- Likes
- 47