Learning Advanced Self-Attention for Linear Transformers in the Singular Value Domain

Wi, Hyowon, Choi, Jeongwhan, Park, Noseong

May-14-2025–arXiv.org Artificial Intelligence

Transformers have demonstrated remarkable performance across diverse domains. The key component of Transformers is self-attention, which learns the relationship between any two tokens in the input sequence. Recent studies have revealed that the self-attention can be understood as a normalized adjacency matrix of a graph. Notably, from the perspective of graph signal processing (GSP), the self-attention can be equivalently defined as a simple graph filter, applying GSP using the value vector as the signal. However, the self-attention is a graph filter defined with only the first order of the polynomial matrix, and acts as a low-pass filter preventing the effective leverage of various frequency information. Consequently, existing self-attention mechanisms are designed in a rather simplified manner. Therefore, we propose a novel method, called A ttentive G raph F ilter (AGF), interpreting the self-attention as learning the graph filter in the singular value domain from the perspective of graph signal processing for directed graphs with the linear complexity w.r.t. the input length n, i.e., O p nd

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

May-14-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology
  - Data Science (0.68)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Vision (0.93)
    - Machine Learning
      - Statistical Learning (0.69)
      - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found