hltcoe/megawika
SummarizationQuestion AnsweringText GenerationAF, AR, AZcc-by-sa-4.0
Hltcoe/megawika is a summarization-focused dataset in AF, AR, AZ distributed in Parquet format. It is distributed under the cc-by-sa-4.0 license and falls in the 10M<n<100M size category, and has been downloaded 19.4K times.
About hltcoe/megawika
MegaWika is a multi- and crosslingual text dataset containing 30 million
Wikipedia passages with their scraped and cleaned web citations. The
passages span 50 Wikipedias in 50 languages, and the articles in which
the passages were originally embed...
Details
- Task
- Summarization, Question Answering, Text Generation
- Language
- AF, AR, AZ
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10M<n<100M
- Creator
- hltcoe
- Year
- 2026
- License
- cc-by-sa-4.0
- Downloads
- 19405
- Likes
- 41