Skip to content

CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary

General NLPEnglish

The CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary dataset is a English General NLP resource from CohereLabs at 2024. With 6.2K downloads and 49 likes, it is actively used by the community and is a 100M<n<1B-scale dataset.

About CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary

Multilingual Embeddings for Wikipedia in 300+ Languages (int8 & binary embeddings) This dataset contains the wikimedia/wikipedia dataset dump from 2023-11-01 from Wikipedia in all 300+ languages. The embeddings are provided as int8 and ubinary ...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
CohereLabs
Year
2024
Downloads
6155
Likes
49
Download Homepage

Related General NLP datasets

FAQ