Skip to content

GPT-2 Medium (FlashAttention)

Stanford UniversityUniversity at BuffaloLanguage modeling/generationOpen weights

Developed by Stanford University,University at Buffalo in 2022, GPT-2 Medium (FlashAttention) is a language modeling/generation model with 355000000.0 parameters with openly available weights.

About GPT-2 Medium (FlashAttention)

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to r

Details

Provider
Stanford University,University at Buffalo
Task
Language modeling/generation
Parameters
355000000.0
Released
2022-05-27
Open weights
Yes
View model source

Explore

FAQ