Skip to content

legacy-datasets/c4

Text GenerationFill MaskEN

The legacy-datasets/c4 dataset is a EN text generation resource from legacy-datasets at 2022.

About legacy-datasets/c4

A colossal, cleaned version of Common Crawl's web crawl corpus. Based on Common Crawl dataset: "https://commoncrawl.org". This is the processed version of Google's C4 dataset by AllenAI.

Details

Task
Text Generation, Fill Mask
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
legacy-datasets
Year
2022
Download

Related Text Generation, Fill Mask datasets

FAQ