coral-nlp/german-commons
Text GenerationDEodc-by
Coral-nlp/german-commons is a text generation dataset in DE from coral-nlp in Parquet format. It is distributed under the odc-by license and falls in the 10M<n<100M size category, and has been downloaded 2K times.
About coral-nlp/german-commons
German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models
A comprehensive collection of German-language text data under open licenses for training German language models.
Datasheet: DATASHEET.md.
Paper: arxiv.org/a...
Details
- Task
- Text Generation
- Language
- DE
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10M<n<100M
- Creator
- coral-nlp
- Year
- 2025
- License
- odc-by
- Downloads
- 2014
- Likes
- 38