Question 1

What is the WikiText-103 & 2 dataset?

Accepted Answer

WikiText-103 & 2 is a language modeling dataset in English from Merity et al. with 100 records in TOKENS format.

Question 2

Is WikiText-103 & 2 a benchmark?

Accepted Answer

WikiText-103 & 2 is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download WikiText-103 & 2?

Accepted Answer

WikiText-103 & 2 is available at its source: https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/.

WikiText-103 & 2

Details