EIT: Enhanced Interactive Transformer
Zheng, Tong, Li, Bei, Bao, Huiwen, Xiao, Tong, Zhu, Jingbo
–arXiv.org Artificial Intelligence
In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT), to address the issue of head degradation in self-attention mechanisms. Our approach replaces the traditional multi-head self-attention mechanism with the Enhanced Multi-Head Attention (EMHA) mechanism, which relaxes the one-to-one mapping constraint among queries and keys, allowing each query to attend to multiple keys. Furthermore, we introduce two interaction models, Inner-Subspace Interaction and Cross-Subspace Interaction, to fully utilize the many-to-many mapping capabilities of EMHA. Extensive experiments on a wide range of tasks (e.g. machine translation, abstractive summarization, grammar correction, language modelling and brain disease automatic diagnosis) show its superiority with a very modest increase in model size.
arXiv.org Artificial Intelligence
Dec-20-2022
- Country:
- Europe > Germany
- Berlin (0.04)
- Asia
- Middle East > Israel (0.04)
- Malaysia > Kuala Lumpur
- Kuala Lumpur (0.04)
- China > Liaoning Province
- Shenyang (0.04)
- Europe > Germany
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Technology: