BMX: Boosting Machine Translation Metrics with Explainability
Leiter, Christoph, Nguyen, Hoa, Eger, Steffen
–arXiv.org Artificial Intelligence
State-of-the-art machine translation evaluation metrics are based on black-box language models. Hence, recent works consider their explainability with the goals of better understandability for humans and better metric analysis, including failure cases. In contrast, we explicitly leverage explanations to boost the metrics' performance. In particular, we perceive explanations as word-level scores, which we convert, via power means, into sentence-level scores. We combine this sentence-level score with the original metric to obtain a better metric. Our extensive evaluation and analysis across 5 datasets, 5 metrics and 4 explainability techniques shows that some configurations reliably improve the original metrics' correlation with human judgment. On two held datasets for testing, we obtain improvements in 15/18 resp. 4/4 cases. The gains in Pearson correlation are up to 0.032 resp. 0.055. We make our code available.
arXiv.org Artificial Intelligence
Dec-20-2022
- Country:
- Asia
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Germany > Hesse
- Darmstadt Region > Darmstadt (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Slovenia (0.04)
- Belgium > Brussels-Capital Region
- North America
- Dominican Republic (0.04)
- United States
- California > San Francisco County
- San Francisco (0.14)
- Pennsylvania (0.04)
- California > San Francisco County
- Asia
- Genre:
- Overview (0.68)
- Research Report (1.00)
- Technology: