SOFT: Softmax-free Transformer with Linear Complexity

Neural Information Processing Systems 

Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention.