On the Information Redundancy in Non-Autoregressive Translation

Wang, Zhihao, Wang, Longyue, Su, Jinsong, Yao, Junfeng, Tu, Zhaopeng

May-4-2024–arXiv.org Artificial Intelligence

Token repetition is a typical form of multi-modal problem in fully non-autoregressive translation (NAT). In this work, we revisit the multi-modal problem in recently proposed NAT models. Our study reveals that these advanced models have introduced other types of information redundancy errors, which cannot be measured by the conventional metric - the continuous repetition ratio. By manually annotating the NAT outputs, we identify two types of information redundancy errors that correspond well to lexical and reordering multi-modality problems. Since human annotation is time-consuming and labor-intensive, we propose automatic metrics to evaluate the two types of redundant errors. Our metrics allow future studies to evaluate new methods and gain a more comprehensive understanding of their effectiveness.

nat model, redundancy, translation, (14 more...)

arXiv.org Artificial Intelligence

May-4-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Maine (0.04)
- Asia
  - Taiwan (0.04)
  - China > Fujian Province
    - Xiamen (0.04)

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine (0.68)
- Government > Regional Government (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)