Routing Transformer (WT-103)
Google ResearchLanguage modelingOpen weights
Developed by Google Research in 2020, Routing Transformer (WT-103) is a language modeling model with 79500000.0 parameters with openly available weights.
About Routing Transformer (WT-103)
Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic compute and memory requirements with respect to sequence length. Successful approaches to reduce
Details
- Provider
- Google Research
- Task
- Language modeling
- Parameters
- 79500000.0
- Released
- 2020-03-12
- Open weights
- Yes