CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers