Skip to content

Zyphra/Zyda-2

Text GenerationEN

Zyphra/Zyda-2 is a text generation dataset in EN from Zyphra in Parquet format.

About Zyphra/Zyda-2

Zyda-2 Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, hig...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
Zyphra
Year
2024
Download

Related Text Generation datasets

FAQ