MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation

Riley, Parker, Deutsch, Daniel, Finkelstein, Mara, DiIanni, Colten, Juraska, Juraj, Freitag, Markus

Oct-29-2025–arXiv.org Artificial Intelligence

Human evaluation of machine translation is in an arms race with translation model quality: as our models get better, our evaluation methods need to be improved to ensure that quality gains are not lost in evaluation noise. To this end, we experiment with a two-stage version of the current state-of-the-art translation evaluation paradigm (MQM), which we call MQM re-annotation. In this setup, an MQM annotator reviews and edits a set of pre-existing MQM annotations, that may have come from themselves, another human annotator, or an automatic MQM annotation system. We demonstrate that rater behavior in re-annotation aligns with our goals, and that re-annotation results in higher-quality annotations, mostly due to finding errors that were missed during the first pass.

annotation, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

Oct-29-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)
  - Singapore (0.05)
- Europe > Ireland
  - Leinster > County Dublin > Dublin (0.04)
- North America
  - Mexico > Mexico City
    - Mexico City (0.04)
  - United States > Florida
    - Miami-Dade County > Miami (0.05)

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Large Language Model (0.69)
  - Machine Translation (1.00)