Hidden Markov Transformer for Simultaneous Machine Translation
–arXiv.org Artificial Intelligence
Simultaneous machine translation (SiMT) outputs the target sequence while receiving the source sequence, and hence learning when to start translating each target token is the core challenge for SiMT task. However, it is non-trivial to learn the optimal moment among many possible moments of starting translating, as the moments of starting translating always hide inside the model and can only be supervised with the observed target sequence. In this paper, we propose a Hidden Markov Transformer (HMT), which treats the moments of starting translating as hidden events and the target sequence as the corresponding observed events, thereby organizing them as a hidden Markov model. HMT explicitly models multiple moments of starting translating as the candidate hidden events, and then selects one to generate the target token. During training, by maximizing the marginal likelihood of the target sequence over multiple moments of starting translating, HMT learns to start translating at the moments that target tokens can be generated more accurately. Recently, with the increase of real-time scenarios such as live broadcasting, video subtitles and conferences, simultaneous machine translation (SiMT) attracts more attention (Cho & Esipova, 2016; Gu et al., 2017; Ma et al., 2019; Arivazhagan et al., 2019), which requires the model to receive source token one by one and simultaneously generates the target tokens. For the purpose of high-quality translation under low latency, SiMT model needs to learn when to start translating each target token (Gu et al., 2017), thereby making a wise decision between waiting for the next source token (i.e., READ action) and generating a target token (i.e., WRITE action) during the translation process. However, learning when to start translating target tokens is not trivial for a SiMT model, as the moments of starting translating always hide inside the model and we can only supervise the SiMT model with the observed target sequence (Zhang & Feng, 2022a).
arXiv.org Artificial Intelligence
Mar-1-2023
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Washington > King County
- Europe
- Germany > Berlin (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- Genre:
- Research Report (0.64)