Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding

Jin, Mingyu, Mei, Kai, Xu, Wujiang, Sun, Mingjie, Tang, Ruixiang, Du, Mengnan, Liu, Zirui, Zhang, Yongfeng

Feb-3-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) have achieved remarkable success in contextual knowledge understanding. In this paper, we show that these concentrated massive values consistently emerge in specific regions of attention queries (Q) and keys (K) while not having such patterns in values (V) in various modern transformer-based LLMs (Q, K, and V mean the representations output by the query, key, and value layers respectively). Through extensive experiments, we further demonstrate that these massive values play a critical role in interpreting contextual knowledge (i.e., knowledge obtained from the current context window) rather than in retrieving parametric knowledge stored within the model's parameters. Our further investigation of quantization strategies reveals that ignoring these massive values leads to a pronounced drop in performance on tasks requiring rich contextual understanding, aligning with our analysis. Finally, we trace the emergence of concentrated massive values and find that such concentration is caused by Rotary Positional Encoding (RoPE), which has appeared since the first layers. These findings shed new light on how Q and K operate in LLMs and offer practical insights for model design and optimization.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Feb-3-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - New York (0.04)
    - New Jersey (0.04)
    - Minnesota (0.04)
    - District of Columbia > Washington (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - Florida > Miami-Dade County
      - Miami (0.04)
    - California > Los Angeles County
      - Los Angeles (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - United Kingdom (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Russia > Southern Federal District
    - Krasnodar Krai > Krasnodar (0.04)
  - Poland > Łódź Province
    - Łódź (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Croatia > Zagreb County
    - Zagreb (0.04)
- Asia
  - Vietnam (0.04)
  - Russia (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.14)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Media > Film (1.00)
- Leisure & Entertainment > Sports (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found