Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts

Carmo, Félix do, Kanojia, Diptesh

arXiv.org Artificial Intelligence 

Edit distances are a class of metrics used to quantify the similarity between two text sequences by calculating the minimum number of operations required to transform one sequence into another. These operations typically include insertion, deletion, substitution, and movement of characters or words. The application of edit distances extends beyond simple string comparison and is used extensively in evaluating machinetranslated text against human references, quality estimation, and post-editing tasks. This tutorial is targeted at researchers of machine translation and of human translation, as well as corporate members of AMTA. It focuses on the uses of edit distances, such as TER - Translation Edit Rate (Snover et al., 2006), as proxies of translation effort and as informants of other downstream tasks, such as MT evaluation and post-editing, error annotation with MQM (Burchardt, 2013), quality estimation - QE (Specia et al., 2022) and automatic post-editing - APE (do Carmo et al., 2021). The application of edit distances in downstream tasks often assumes that these accurately represent work done by post-editors and real errors that need to be corrected in MT output. We will discuss how imperfect edit distances are in capturing the details of this error correction work and the implications for researchers and for commercial applications of these uses of edit distances. In terms of commercial applications, we will discuss their integration in computer-assisted translation tools and how the perception of the connection between edit distances and post-editor effort affects the definition of translator rates.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found