Non-autoregressive Streaming Transformer for Simultaneous Translation
Ma, Zhengrui, Zhang, Shaolei, Guo, Shoutao, Shao, Chenze, Zhang, Min, Feng, Yang
–arXiv.org Artificial Intelligence
Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. However, training these models to achieve high quality while maintaining low latency often leads to a tendency for aggressive anticipation. We argue that such issue stems from the autoregressive architecture upon which most existing SiMT models are built. To address those issues, we propose non-autoregressive streaming Transformer (NAST) which comprises a unidirectional encoder and a non-autoregressive decoder with intra-chunk parallelism. We enable NAST to generate the blank token or repetitive tokens to adjust its READ/WRITE strategy flexibly, and train it to maximize the non-monotonic latent alignment with an alignment-based latency loss. Experiments on various SiMT benchmarks demonstrate that NAST outperforms previous strong autoregressive SiMT baselines.
arXiv.org Artificial Intelligence
Oct-23-2023
- Country:
- Asia
- India (0.05)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Middle East
- Jordan (0.14)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Germany > Berlin (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- Canada > Ontario
- South America > Chile
- Asia
- Genre:
- Research Report (0.50)
- Technology: