jobs-git/Zyda-2
Text GenerationENodc-by
The jobs-git/Zyda-2 dataset is a EN text generation resource from jobs-git at 2025. With 162.8K downloads and 1 likes, it is actively used by the community. It is released under the odc-by license and is a n>1T-scale dataset.
About jobs-git/Zyda-2
Zyda-2
Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, hig...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- n>1T
- Creator
- jobs-git
- Year
- 2025
- License
- odc-by
- Downloads
- 162767
- Likes
- 1