Skip to content

nyu-mll/roberta-med-small-1M-3

Nyu-mll/roberta-med-small-1M-3 is a machine learning model.

About nyu-mll/roberta-med-small-1M-3

We release 3 models with lowest perplexities for each pretraining data size out of 25 runs (or 10 in the case of 1B tokens) The data reproduces that of BERT: We combine English Wikipedia and a reproduction of BookCorpus using texts from smashwords in a ratio of approximately 3:1 . The hyperparameters and validation perplexities corresponding to each model are as follows:. (AH = number of attention heads; HS = hidden size; . FFN = feedforward network dimension; P = number . of parameters.) We select:Peak Learning rate: 5e-4%, warmup steps: 6% of max steps; . (Peak learning rate:,
View model source

Explore

FAQ