Zyphra/Zyda-2
Text GenerationEN
Zyphra/Zyda-2 is a text generation dataset in EN from Zyphra in Parquet format.
About Zyphra/Zyda-2
Zyda-2
Zyda-2 is a 5 trillion token language modeling dataset created by collecting open and high quality datasets and combining them and cross-deduplication and model-based quality filtering. Zyda-2 comprises diverse sources of web data, hig...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- Zyphra
- Year
- 2024