allenai/dolma
Text GenerationENodc-by
Allenai/dolma is a text generation dataset in EN from allenai in Parquet format. It is distributed under the odc-by license and falls in the n>1T size category, and has been downloaded 4.4K times.
About allenai/dolma
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- n>1T
- Creator
- allenai
- Year
- 2023
- License
- odc-by
- Downloads
- 4449
- Likes
- 1048