Better Late Than Never: Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation
Polák, Peter, Papi, Sara, Bentivogli, Luisa, Bojar, Ondřej
–arXiv.org Artificial Intelligence
Simultaneous speech-to-text translation (SimulST) systems have to balance translation quality with latency--the delay between speech input and the translated output. While quality evaluation is well established, accurate latency measurement remains a challenge. Existing metrics often produce inconsistent or misleading results, especially in the widely used short-form setting, where speech is artificially presegmented. In this paper, we present the first comprehensive analysis of SimulST latency metrics across language pairs, systems, and both short- and long-form regimes. We uncover a structural bias in current metrics related to segmentation that undermines fair and meaningful comparisons. To address this, we introduce YAAL (Yet Another Average Lagging), a refined latency metric that delivers more accurate evaluations in the short-form regime. We extend YAAL to LongYAAL for unsegmented audio and propose SoftSegmenter, a novel resegmentation tool based on word-level alignment. Our experiments show that YAAL and LongYAAL outperform popular latency metrics, while SoftSegmenter enhances alignment quality in long-form evaluation, together enabling more reliable assessments of SimulST systems.
arXiv.org Artificial Intelligence
Sep-23-2025
- Country:
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Vietnam > Thái Bình Province
- Thái Bình (0.04)
- Middle East > UAE
- Europe
- North America
- Canada
- Dominican Republic (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Florida > Miami-Dade County
- Asia
- Genre:
- Research Report (1.00)
- Technology: