Skip to content

Adaptive Input Transformer + RD

Microsoft Research AsiaSoochow UniversityLanguage modeling

Adaptive Input Transformer + RD is language modeling model published by Microsoft Research Asia,Soochow University in 2021 featuring 247000000.00000003 parameters.

About Adaptive Input Transformer + RD

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of d

Details

Provider
Microsoft Research Asia,Soochow University
Task
Language modeling
Parameters
247000000.00000003
Released
2021-06-28
Open weights
No
View model source

Explore

FAQ