Skylion007/openwebtext
Text GenerationFill MaskEN
Skylion007/openwebtext is a text generation-focused dataset in EN distributed in Parquet format.
About Skylion007/openwebtext
Dataset Card for "openwebtext"
Dataset Summary
An open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2.
This distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University.
...
Details
- Task
- Text Generation, Fill Mask
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- Skylion007
- Year
- 2022