Bridging the Divide: Reconsidering Softmax and Linear Attention

Neural Information Processing Systems 

Widely adopted in modern Vision Transformer designs, Softmax attention can effectively capture long-range visual information; however, it incurs excessive computational cost when dealing with high-resolution inputs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found