CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary
General NLPEnglish
The CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary dataset is a English General NLP resource from CohereLabs at 2024. With 6.2K downloads and 49 likes, it is actively used by the community and is a 100M<n<1B-scale dataset.
About CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary
Multilingual Embeddings for Wikipedia in 300+ Languages (int8 & binary embeddings)
This dataset contains the wikimedia/wikipedia dataset dump from 2023-11-01 from Wikipedia in all 300+ languages. The embeddings are provided as int8 and ubinary ...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- CohereLabs
- Year
- 2024
- Downloads
- 6155
- Likes
- 49