Shitao/MLDR
Text RetrievalAR, DE, EN
Shitao/MLDR is a text retrieval-focused dataset in AR, DE, EN distributed in Parquet format.
About Shitao/MLDR
Dataset Summary
MLDR is a Multilingual Long-Document Retrieval dataset built on Wikipeida, Wudao and mC4, covering 13 typologically diverse languages. Specifically, we sample lengthy articles from Wikipedia, Wudao and mC4 datasets and randomly ...
Details
- Task
- Text Retrieval
- Language
- AR, DE, EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- Shitao
- Year
- 2024