Skip to content

Zyphra/Zyda

Text GenerationEN

The Zyphra/Zyda dataset is a EN text generation resource from Zyphra at 2024.

About Zyphra/Zyda

Zyda Zyda is a 1.3T language modeling dataset created by collecting open and high quality datasets and combining them and performing a uniform filtering and deduplication step. We find that Zyda performs extremely well in ablations and is at ...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
Zyphra
Year
2024
Download

Related Text Generation datasets

FAQ