Skip to content

jhu-clsp/ettin-pretraining-data

Text GenerationFill MaskText ClassificationENmit

Jhu-clsp/ettin-pretraining-data is a text generation-focused dataset in EN distributed in Parquet format. It is distributed under the mit license, and has been downloaded 165.9K times.

About jhu-clsp/ettin-pretraining-data

Ettin Pre-training Data Phase 1 of 3: Diverse pre-training data mixture (1.7T tokens) used to train the Ettin model suite. This dataset contains the pre-training phase data used to train all Ettin encoder and decoder models. The data is p...

Details

Task
Text Generation, Fill Mask, Text Classification
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
jhu-clsp
Year
2024
License
mit
Downloads
165903
Likes
9
Download Homepage

Related Text Generation, Fill Mask, Text Classification datasets

FAQ