Dissecting Query-Key Interaction in Vision Transformers Xu Pan 1,2 Aaron Philip 3 Odelia Schwartz

May-29-2025, 18:38:28 GMT–Neural Information Processing Systems

Self-attention in vision transformers is often thought to perform perceptual grouping where tokens attend to other tokens with similar embeddings, which could correspond to semantically similar features of an object. However, attending to dissimilar tokens can be beneficial by providing contextual information. We propose to analyze the query-key interaction by the singular value decomposition of the interaction matrix (i.e.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

May-29-2025, 18:38:28 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)
  - Vision (1.00)