Skip to content

community-datasets/tapaco

TranslationText ClassificationAF, AR, AZcc-by-2.0

Community-datasets/tapaco is a translation-focused dataset in AF, AR, AZ that provides 3,852,384 labeled examples distributed in Parquet format. It is distributed under the cc-by-2.0 license and falls in the 1M<n<10M size category, and has been downloaded 596 times.

About community-datasets/tapaco

Dataset Card for TaPaCo Corpus Dataset Summary A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a crowdsourcing project mainly geared towards language learners. Its aim is to prov...

Details

Task
Translation, Text Classification
Language
AF, AR, AZ
Format
Parquet
Rows / instances
3852384
Size
1M<n<10M
Creator
community-datasets
Year
2022
License
cc-by-2.0
Downloads
596
Likes
47
Download Homepage

FAQ