Generalized Probabilistic Attention Mechanism in Transformers
Heo, DongNyeong, Choi, Heeyoul
–arXiv.org Artificial Intelligence
The Transformer architecture has become widely adopted due to its demonstrated success, attributed to the attention mechanism at its core. Despite these successes, the attention mechanism of Transformers is associated with two well-known issues: rank-collapse and gradient vanishing. In this paper, we present a theoretical analysis that it is inherently difficult to address both issues simultaneously in the conventional attention mechanism. To handle these issues, we introduce a novel class of attention mechanism, referred to as generalized probabilistic attention mechanism (GPAM), and its dual-attention implementation within the Transformer architecture. Unlike conventional attention mechanisms, GPAM allows for negative attention scores while preserving a fixed total sum. We provide theoretical evidence that the proposed dual-attention GPAM (daGPAM) effectively mitigates both the rank-collapse and gradient vanishing issues which are difficult to resolve simultaneously with the conventional attention mechanisms. Furthermore, we empirically validate this theoretical evidence, demonstrating the superiority of daGPAM compared to other alternative attention mechanisms that were proposed to address the same issues. Additionally, we demonstrate the practical benefits of GPAM in natural language processing tasks, such as language modeling and neural machine translation. The Transformer model, as introduced by (Vaswani, 2017), has emerged as a pivotal architecture driving the advancement of contemporary deep learning models across various domains, including natural language processing (Brown et al., 2020), audio signal processing (Gulati et al., 2020), and image processing (Dosovitskiy et al., 2021). Central to the Transformer's success is the attention mechanism, which facilitates the contextualization of input token representations.
arXiv.org Artificial Intelligence
Oct-20-2024
- Country:
- Asia > South Korea
- Gyeongsangbuk-do > Pohang (0.04)
- Europe > Belgium (0.04)
- Asia > South Korea
- Genre:
- Research Report (0.64)
- Technology: