Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? Revisiting a Petroglyph

Dec-31-2024–arXiv.org Artificial Intelligence

Do autoregressive Transformer language models require explicit positional encodings (PEs)? The answer is "no" as long as they have more than one layer -- they can distinguish sequences with permuted tokens without requiring explicit PEs. This property has been known since early efforts (those contemporary with GPT-2) adopting the Transformer for language modeling. However, this result does not appear to have been well disseminated and was even rediscovered recently. This may be partially due to a sudden growth of the language modeling community after the advent of GPT-2, but perhaps also due to the lack of a clear explanation in prior publications, despite being commonly understood by practitioners in the past. Here we review this long-forgotten explanation why explicit PEs are nonessential for multi-layer autoregressive Transformers (in contrast, one-layer models require PEs to discern order information of their input tokens). We also review the origin of this result, and hope to re-establish it as a common knowledge.

language model, proc, transformer, (12 more...)

arXiv.org Artificial Intelligence

Dec-31-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Maryland > Baltimore (0.04)
    - New York > Erie County
      - Buffalo (0.04)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Colorado > Denver County
      - Denver (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.05)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
  - Canada > Alberta
    - Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
- Europe
  - Czechia > Prague (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Austria
    - Vienna (0.14)
    - Styria > Graz (0.04)
- Asia
  - Singapore (0.04)
  - Japan (0.04)
  - China > Hong Kong (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.14)

Genre:
- Overview (0.67)
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found