Rethinking Attention with Performers -- Part I
This article's objective is to present a hand-wavy understanding of how Performers [1] work. Transformers dominate the deep-learning literature in 2022. Unfortunately, Transformers suffer quadratic complexity in the self-attention layer. This has hindered transformers for long-input signals, i.e., large sequence L. Large sequences are not critical in NLP applications since most sentences have less than 40 words. Yet, large sequences are abundant in other applications such as protein sequencing [1] and high-resolution medical images [4].
Oct-11-2022, 09:08:10 GMT