Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective

Open in new window