Skip to content

CohereLabs/wikipedia-2023-11-embed-multilingual-v3

General NLPEnglish

The CohereLabs/wikipedia-2023-11-embed-multilingual-v3 dataset is a English General NLP resource from CohereLabs at 2024. With 10.9K downloads and 248 likes, it is actively used by the community and is a 100M<n<1B-scale dataset.

About CohereLabs/wikipedia-2023-11-embed-multilingual-v3

Multilingual Embeddings for Wikipedia in 300+ Languages This dataset contains the wikimedia/wikipedia dataset dump from 2023-11-01 from Wikipedia in all 300+ languages. The individual articles have been chunked and embedded with the state-of-t...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
CohereLabs
Year
2024
Downloads
10926
Likes
248
Download Homepage

Related General NLP datasets

FAQ