Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

Open in new window