f6a8dd1c954c8506aadc764cc32b895e-Paper.pdf

Neural Information Processing Systems 

Clustered attention makes use of similarities between queries and groups them in order to reduce the computational cost. In particular, we perform fast clustering using locality-sensitive hashing and K-Means and only compute the attention once per cluster.