Bridging the Divide: Reconsidering Softmax and Linear Attention
–Neural Information Processing Systems
Widely adopted in modern Vision Transformer designs, Softmax attention can effectively capture long-range visual information; however, it incurs excessive computational cost when dealing with high-resolution inputs.
Neural Information Processing Systems
Nov-19-2025, 21:21:59 GMT