BramVanroy/CommonCrawl-CreativeCommons
Text GenerationAFR, DEU, ENG
BramVanroy/CommonCrawl-CreativeCommons is a text generation-focused dataset in AFR, DEU, ENG distributed in Parquet format. It is distributed under the cc license and falls in the 100M<n<1B size category, and has been downloaded 2.2K times.
About BramVanroy/CommonCrawl-CreativeCommons
The Common Crawl Creative Commons Corpus (C5)
Raw CommonCrawl crawls, annotated with Creative Commons license information
C5 is an effort to collect Creative Commons-licensed web data in one place.
The licensing information is extracted from ...
Details
- Task
- Text Generation
- Language
- AFR, DEU, ENG
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- BramVanroy
- Year
- 2025
- License
- cc
- Downloads
- 2197
- Likes
- 36