Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining

Open in new window