BlonD: An Automatic Evaluation Metric for Document-level MachineTranslation
Jiang, Yuchen, Ma, Shuming, Zhang, Dongdong, Yang, Jian, Huang, Haoyang, Zhou, Ming
–arXiv.org Artificial Intelligence
Standard automatic metrics (such as BLEU) are problematic for document-level MT evaluation. They can neither distinguish document-level improvements in translation quality from sentence-level ones nor can they identify the specific discourse phenomena that caused the translation errors. To address these problems, we propose an automatic metric BlonD for document-level machine translation evaluation. BlonD takes discourse coherence into consideration by calculating the recall and distance of check-pointing phrases and tags, and further provides comprehensive evaluation scores by combining with n-gram. Extensive comparisons between BlonD and existing evaluation metrics are conducted to illustrate their critical distinctions. Experimental results show that BlonD has a much higher document-level sensitivity with respect to previous metrics. The human evaluation also reveals high Pearson R correlation values between BlonD scores and manual quality judgments.
arXiv.org Artificial Intelligence
Mar-22-2021
- Country:
- Asia > South Korea (0.04)
- Oceania > Australia
- North America
- United States
- Maryland > Baltimore (0.04)
- Pennsylvania (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Colorado > Boulder County
- Boulder (0.04)
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- Bulgaria (0.04)
- Germany > Berlin (0.04)
- Slovenia (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Latvia > Riga Municipality
- Riga (0.04)
- Italy > Tuscany
- Florence (0.04)
- Denmark > Capital Region
- Copenhagen (0.05)
- Sweden > Uppsala County
- Uppsala (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- France > Île-de-France
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Genre:
- Research Report > New Finding (0.34)
- Technology: