CohereLabs/wikipedia-2023-11-embed-multilingual-v3
General NLPEnglish
The CohereLabs/wikipedia-2023-11-embed-multilingual-v3 dataset is a English General NLP resource from CohereLabs at 2024. With 10.9K downloads and 248 likes, it is actively used by the community and is a 100M<n<1B-scale dataset.
About CohereLabs/wikipedia-2023-11-embed-multilingual-v3
Multilingual Embeddings for Wikipedia in 300+ Languages
This dataset contains the wikimedia/wikipedia dataset dump from 2023-11-01 from Wikipedia in all 300+ languages.
The individual articles have been chunked and embedded with the state-of-t...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- CohereLabs
- Year
- 2024
- Downloads
- 10926
- Likes
- 248