Skip to content

jobs-git/Zyda-2

Text GenerationENodc-by

The jobs-git/Zyda-2 dataset is a EN text generation resource from jobs-git at 2025. With 162.8K downloads and 1 likes, it is actively used by the community. It is released under the odc-by license and is a n>1T-scale dataset.

About jobs-git/Zyda-2

Zyda-2 Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, hig...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Size
n>1T
Creator
jobs-git
Year
2025
License
odc-by
Downloads
162767
Likes
1
Download Homepage

Related Text Generation datasets

FAQ