legacy-datasets/c4
Text GenerationFill MaskEN
The legacy-datasets/c4 dataset is a EN text generation resource from legacy-datasets at 2022.
About legacy-datasets/c4
A colossal, cleaned version of Common Crawl's web crawl corpus.
Based on Common Crawl dataset: "https://commoncrawl.org".
This is the processed version of Google's C4 dataset by AllenAI.
Details
- Task
- Text Generation, Fill Mask
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- legacy-datasets
- Year
- 2022