CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers

Open in new window