Rethinking Attention with Performers -- Part I

#artificialintelligence 

This article's objective is to present a hand-wavy understanding of how Performers [1] work. Transformers dominate the deep-learning literature in 2022. Unfortunately, Transformers suffer quadratic complexity in the self-attention layer. This has hindered transformers for long-input signals, i.e., large sequence L. Large sequences are not critical in NLP applications since most sentences have less than 40 words. Yet, large sequences are abundant in other applications such as protein sequencing [1] and high-resolution medical images [4].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found