EIT: Enhanced Interactive Transformer
Zheng, Tong, Li, Bei, Bao, Huiwen, Xiao, Tong, Zhu, Jingbo
–arXiv.org Artificial Intelligence
In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT), to address the issue of head degradation in self-attention mechanisms. Our approach replaces the traditional multi-head self-attention mechanism with the Enhanced Multi-Head Attention (EMHA) mechanism, which relaxes the one-to-one mapping constraint among queries and keys, allowing each query to attend to multiple keys. Furthermore, we introduce two interaction models, Inner-Subspace Interaction and Cross-Subspace Interaction, to fully utilize the many-to-many mapping capabilities of EMHA. Extensive experiments on a wide range of tasks (e.g. machine translation, abstractive summarization, grammar correction, language modelling and brain disease automatic diagnosis) show its superiority with a very modest increase in model size.
arXiv.org Artificial Intelligence
Dec-20-2022
- Country:
- Asia (0.67)
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Technology: