Multipole Semantic Attention: A Fast Approximation of Softmax Attention for Pretraining