Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost

Open in new window