Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding Shengjie Luo 1, Di He

Mar-21-2025, 13:50:42 GMT–Neural Information Processing Systems

The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful attention modules that go beyond the dot-then-exponentiate style, e.g., Transformers with relative positional encoding (RPE). Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing. In this paper, we propose a novel way to accelerate attention calculation for Transformers with RPE on top of the kernelized attention.

large language model, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Mar-21-2025, 13:50:42 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.28)

Genre:
- Research Report (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.48)
    - Statistical Learning (0.49)
  - Natural Language > Large Language Model (0.34)
  - Representation & Reasoning (1.00)
  - Vision (1.00)