Improving Autoregressive NLP Tasks via Modular Linearized Attention

Open in new window