Goto

Collaborating Authors

 asymmetric clustering


SMYRF - Efficient Attention using Asymmetric Clustering

Neural Information Processing Systems

We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where N is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g.


SMYRF - Efficient Attention using Asymmetric Clustering

Neural Information Processing Systems

We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from O(N 2) to O(N \log N), where N is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g.


Review for NeurIPS paper: SMYRF - Efficient Attention using Asymmetric Clustering

Neural Information Processing Systems

This paper proposes a method for reducing the quadratic bottleneck of transformer architectures to O(N log N), using an asymmetric LHS clustering strategy. The paper also shows that finding an optimal assignment is NP-hard and thus, heuristic approaches must be pursued. They propose a novel type of balanced clustering algorithm to approximate attention. The method can be directly used for pre-trained models and achieves competitive/better performance with BigGAN/BERT/RoBERTa by shrinking 50% memory. There was some disagreement among reviewers about this paper, with R1 and R3 recommending solid acceptance, and R2 and R4 recommending weak reject.


SMYRF - Efficient Attention using Asymmetric Clustering

Neural Information Processing Systems

We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from O(N 2) to O(N \log N), where N is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g.