Stand-Alone Self-Attention in Vision Models

Niki Parmar, Prajit Ramachandran, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jon Shlens

Mar-23-2025, 02:45:16 GMT–Neural Information Processing Systems

Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models with content-based interactions, such as self-attention and non-local means, to achieve gains on a number of vision tasks. The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions. In developing and testing a pure self-attention vision model, we verify that self-attention can indeed be an effective stand-alone layer.

convolution, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Mar-23-2025, 02:45:16 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Machine Translation (0.68)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
Stand-Alone Self-Attention in Vision Models
Stand-Alone Self-Attention in Vision Models

Similar Docs Excel Report more

Title	Similarity	Source
None found