f6a8dd1c954c8506aadc764cc32b895e-Paper.pdf
–Neural Information Processing Systems
Clustered attention makes use of similarities between queries and groups them in order to reduce the computational cost. In particular, we perform fast clustering using locality-sensitive hashing and K-Means and only compute the attention once per cluster.
Neural Information Processing Systems
Feb-11-2026, 03:58:40 GMT