Don't Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation
DiIanni, Colten, Deutsch, Daniel
–arXiv.org Artificial Intelligence
This paper introduces Pairwise Difference Pearson (PDP), a novel segment-level meta-evaluation metric for Machine Translation (MT) that address limitations in previous Pearson's $ρ$-based and and Kendall's $τ$-based meta-evaluation approaches. PDP is a correlation-based metric that utilizes pairwise differences rather than raw scores. It draws on information from all segments for a more robust understanding of score distributions and uses segment-wise pairwise differences to refine Global Pearson to intra-segment score comparisons. Analysis on the WMT'24 shared task shows PDP properly ranks sentinel evaluation metrics and better aligns with human error weightings than previous work. Noise injection analysis demonstrates PDP's robustness to random noise, segment bias, and system bias while highlighting its sensitivity to extreme outliers.
arXiv.org Artificial Intelligence
Oct-1-2025
- Country:
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > UAE
- Europe
- Belgium (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Italy > Tuscany
- Florence (0.04)
- North America > United States
- Florida > Miami-Dade County > Miami (0.04)
- Asia
- Genre:
- Research Report (0.64)
- Technology: