Metric Transforms and Low Rank Representations of Kernels for Fast Attention

Mar-20-2026, 15:03:25 GMT–Neural Information Processing Systems

We introduce a new linear-algebraic tool based on group representation theory, and use it to address three key problems in machine learning.1. Past researchers have proposed fast attention algorithms for LLMs by approximating or replace softmax attention with other functions, such as low-degree polynomials. The key property of these functions is that, when applied entry-wise to the matrix $QK^{\top}$, the result is a low rank matrix when $Q$ and $K$ are $n \times d$ matrices and $n \gg d$. This suggests a natural question: what are all functions $f$ with this property? If other $f$ exist and are quickly computable, they can be used in place of softmax for fast subquadratic attention algorithms.

artificial intelligence, machine learning, proceedings, (13 more...)

Neural Information Processing Systems

Mar-20-2026, 15:03:25 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)