SMYRF: Efficient Attention using Asymmetric Clustering

Neural Information Processing Systems 

We propose a novel type of balanced clustering algorithm to approximate attention.