Skip to content

CohereLabs/aya_collection_language_split

General NLPACE, AFR, AMHapache-2.0

The CohereLabs/aya_collection_language_split dataset is a ACE, AFR, AMH General NLP resource from CohereLabs at 2024 comprising 513,758,189 examples. With 18.9K downloads and 119 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 100M<n<1B-scale dataset.

About CohereLabs/aya_collection_language_split

This is a re-upload of the aya_collection, and only differs in the structure of upload. While the original aya_collection is structured by folders split according to dataset name, this dataset is split by language. We recommend you use this versio...

Details

Task
General NLP
Language
ACE, AFR, AMH
Format
Parquet
Rows / instances
513758189
Size
100M<n<1B
Creator
CohereLabs
Year
2024
License
apache-2.0
Downloads
18915
Likes
119
Download Homepage

Related General NLP datasets

FAQ