Improving Autoregressive NLP Tasks via Modular Linearized Attention

Agostinelli, Victor, Chen, Lizhong

arXiv.org Artificial Intelligence 

Various natural language processing (NLP) tasks necessitate models that are efficient and small based on their ultimate application at the edge or other resource-constrained environment. While prior research has reduced the size of these models, increasing computational efficiency without considerable performance impacts remains difficult, especially for autoregressive tasks. This paper proposes modular linearized attention (MLA), which combines multiple efficient attention mechanisms, including cosFormer [36], to maximize inference quality while achieving notable speedups.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found