Improving Autoregressive NLP Tasks via Modular Linearized Attention
Agostinelli, Victor, Chen, Lizhong
–arXiv.org Artificial Intelligence
Various natural language processing (NLP) tasks necessitate models that are efficient and small based on their ultimate application at the edge or other resource-constrained environment. While prior research has reduced the size of these models, increasing computational efficiency without considerable performance impacts remains difficult, especially for autoregressive tasks. This paper proposes modular linearized attention (MLA), which combines multiple efficient attention mechanisms, including cosFormer [36], to maximize inference quality while achieving notable speedups.
arXiv.org Artificial Intelligence
Jun-24-2023
- Genre:
- Research Report (0.52)
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language (1.00)
- Machine Learning > Neural Networks (1.00)
- Speech > Speech Recognition (0.68)
- Information Technology > Artificial Intelligence