Skip to content

Shitao/MLDR

Text RetrievalAR, DE, EN

Shitao/MLDR is a text retrieval-focused dataset in AR, DE, EN distributed in Parquet format.

About Shitao/MLDR

Dataset Summary MLDR is a Multilingual Long-Document Retrieval dataset built on Wikipeida, Wudao and mC4, covering 13 typologically diverse languages. Specifically, we sample lengthy articles from Wikipedia, Wudao and mC4 datasets and randomly ...

Details

Task
Text Retrieval
Language
AR, DE, EN
Format
Parquet
Rows / instances
N/A
Creator
Shitao
Year
2024
Download

Related Text Retrieval datasets

FAQ