Traveling Words: A Geometric Interpretation of Transformers

Sep-18-2023–arXiv.org Artificial Intelligence

Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere.

layer normalization, matrix, vector, (10 more...)

arXiv.org Artificial Intelligence

Sep-18-2023

arXiv.org PDF

Add feedback

Country:
- South America > Bolivia (0.04)
- North America
  - United States
    - Oklahoma > Rogers County (0.04)
    - Utah > Salt Lake County
      - Murray (0.04)
  - Canada > Alberta
    - Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
- Europe > United Kingdom
  - Scotland (0.04)
- Asia
  - Nepal (0.04)
  - Middle East
    - Yemen (0.04)
    - Republic of Türkiye > Ankara Province
      - Ankara (0.04)
    - Iran > Tehran Province
      - Tehran (0.04)
  - Japan > Honshū
    - Tōhoku > Fukushima Prefecture > Fukushima (0.04)
  - India
    - Tamil Nadu > Chennai (0.04)
    - Maharashtra (0.04)
- Africa
  - Tanzania (0.04)
  - Middle East > Djibouti
    - Arta > `Arta (0.04)

Genre:
- Research Report > New Finding (0.87)

Industry:
- Leisure & Entertainment (0.93)
- Health & Medicine (0.67)
- Food & Agriculture > Agriculture (0.67)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.89)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found