GPT-2 Medium (FlashAttention)
Stanford UniversityUniversity at BuffaloLanguage modeling/generationOpen weights
Developed by Stanford University,University at Buffalo in 2022, GPT-2 Medium (FlashAttention) is a language modeling/generation model with 355000000.0 parameters with openly available weights.
About GPT-2 Medium (FlashAttention)
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to r
Details
- Provider
- Stanford University,University at Buffalo
- Task
- Language modeling/generation
- Parameters
- 355000000.0
- Released
- 2022-05-27
- Open weights
- Yes