Graph Convolutions Enrich the Self-Attention in Transformers!

May-25-2025, 01:59:21 GMT–Neural Information Processing Systems

Transformers, renowned for their self-attention mechanism, have achieved state-ofthe-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

May-25-2025, 01:59:21 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.67)

Industry:
- Government (0.46)
- Information Technology (0.67)
- Media (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Large Language Model (0.69)
    - Text Processing (0.68)
  - Vision (1.00)