Skip to content

BramVanroy/CommonCrawl-CreativeCommons

Text GenerationAFR, DEU, ENG

BramVanroy/CommonCrawl-CreativeCommons is a text generation-focused dataset in AFR, DEU, ENG distributed in Parquet format. It is distributed under the cc license and falls in the 100M<n<1B size category, and has been downloaded 2.2K times.

About BramVanroy/CommonCrawl-CreativeCommons

The Common Crawl Creative Commons Corpus (C5) Raw CommonCrawl crawls, annotated with Creative Commons license information C5 is an effort to collect Creative Commons-licensed web data in one place. The licensing information is extracted from ...

Details

Task
Text Generation
Language
AFR, DEU, ENG
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
BramVanroy
Year
2025
License
cc
Downloads
2197
Likes
36
Download Homepage

Related Text Generation datasets

FAQ