Position Information in Transformers: An Overview

Dufter, Philipp, Schmitt, Martin, Schütze, Hinrich

Feb-22-2021–arXiv.org Artificial Intelligence

Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reorderings of the input. However, language is inherently sequential and word order is essential to the semantics and syntax of an utterance. In this paper, we provide an overview of common methods to incorporate position information into Transformer models. The objectives of this survey are to i) showcase that position information in Transformer is a vibrant and extensive research area; ii) enable the reader to compare existing methods by providing a unified notation and meaningful clustering; iii) indicate what characteristics of an application should be taken into account when selecting a position encoding; iv) provide stimuli for future research. The Transformer model as introduced by Vaswani et al. (2017) has been found to perform well for many tasks, such as machine translation or language modeling. With the rise of pretrained language models (PLMs) (Peters et al., 2018; Howard & Ruder, 2018; Devlin et al., 2019; Brown et al., 2020) Transformer models have become even more popular. As a result they are at the core of many state of the art natural language processing (NLP) models. A Transformer model consists of several layers, or blocks. Each layer is a self-attention (Vaswani et al., 2017) module followed by a feed-forward layer. Layer normalization and residual connections are additional components of a layer.

computational linguistic, information, position information, (14 more...)

arXiv.org Artificial Intelligence

Feb-22-2021

arXiv.org PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Middle East > Qatar
    - Ad-Dawhah > Doha (0.04)
  - China
    - Hong Kong (0.04)
    - Beijing > Beijing (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Overview (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found