Adaptive Input Transformer + RD
Microsoft Research AsiaSoochow UniversityLanguage modeling
Adaptive Input Transformer + RD is language modeling model published by Microsoft Research Asia,Soochow University in 2021 featuring 247000000.00000003 parameters.
About Adaptive Input Transformer + RD
Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of d
Details
- Provider
- Microsoft Research Asia,Soochow University
- Task
- Language modeling
- Parameters
- 247000000.00000003
- Released
- 2021-06-28
- Open weights
- No