AITopics | mqm

Collaborating Authors

mqm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Human Evaluation in Machine Translation with Comparative Judgment

Song, Yixiao, Riley, Parker, Deutsch, Daniel, Freitag, Markus

arXiv.org Artificial IntelligenceFeb-24-2025

Human evaluation is crucial for assessing rapidly evolving language models but is influenced by annotator proficiency and task design. This study explores the integration of comparative judgment into human annotation for machine translation (MT) and evaluates three annotation setups-point-wise Multidimensional Quality Metrics (MQM), side-by-side (SxS) MQM, and its simplified version SxS relative ranking (RR). In MQM, annotators mark error spans with categories and severity levels. SxS MQM extends MQM to pairwise error annotation for two translations of the same input, while SxS RR focuses on selecting the better output without labeling errors. Key findings are: (1) the SxS settings achieve higher inter-annotator agreement than MQM; (2) SxS MQM enhances inter-translation error marking consistency compared to MQM by, on average, 38.5% for explicitly compared MT systems and 19.5% for others; (3) all annotation settings return stable system rankings, with SxS RR offering a more efficient alternative to (SxS) MQM; (4) the SxS settings highlight subtle errors overlooked in MQM without altering absolute system evaluations. To spur further research, we will release the triply annotated datasets comprising 377 ZhEn and 104 EnDe annotation examples.

annotation, annotator, mqm, (14 more...)

arXiv.org Artificial Intelligence

2502.17797

Country:

Asia > Singapore (0.05)
South America > Brazil (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)
(12 more...)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation

Kocmi, Tom, Zouhar, Vilém, Avramidis, Eleftherios, Grundkiewicz, Roman, Karpinska, Marzena, Popović, Maja, Sachan, Mrinmaya, Shmatova, Mariya

arXiv.org Artificial IntelligenceJun-17-2024

High-quality Machine Translation (MT) evaluation relies heavily on human judgments. Comprehensive error classification methods, such as Multidimensional Quality Metrics (MQM), are expensive as they are time-consuming and can only be done by experts, whose availability may be limited especially for low-resource languages. On the other hand, just assigning overall scores, like Direct Assessment (DA), is simpler and faster and can be done by translators of any level, but are less reliable. In this paper, we introduce Error Span Annotation (ESA), a human evaluation protocol which combines the continuous rating of DA with the high-level error severity span marking of MQM. We validate ESA by comparing it to MQM and DA for 12 MT systems and one human reference translation (English to German) from WMT23. The results show that ESA offers faster and cheaper annotations than MQM at the same quality level, without the requirement of expensive MQM experts.

annotator, machine translation, protocol, (10 more...)

arXiv.org Artificial Intelligence

2406.1158

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

Freitag, Markus, Foster, George, Grangier, David, Ratnakar, Viresh, Tan, Qijun, Macherey, Wolfgang

arXiv.org Artificial IntelligenceApr-29-2021

Human evaluation of modern high-quality machine translation systems is a difficult problem, and there is increasing evidence that inadequate evaluation procedures can lead to erroneous conclusions. While there has been considerable research on human evaluation, the field still lacks a commonly-accepted standard procedure. As a step toward this goal, we propose an evaluation methodology grounded in explicit error analysis, based on the Multidimensional Quality Metrics (MQM) framework. We carry out the largest MQM research study to date, scoring the outputs of top systems from the WMT 2020 shared task in two language pairs using annotations provided by professional translators with access to full document context. We analyze the resulting data extensively, finding among other results a substantially different ranking of evaluated systems from the one established by the WMT crowd workers, exhibiting a clear preference for human over machine output. Surprisingly, we also find that automatic metrics based on pre-trained embeddings can outperform human crowd workers. We make our corpus publicly available for further research.

correlation, evaluation, translation, (14 more...)

arXiv.org Artificial Intelligence

2104.14478

Country:

Asia > Japan > Honshū > Tōhoku (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Composition Properties of Inferential Privacy for Time-Series Data

Song, Shuang, Chaudhuri, Kamalika

arXiv.org Machine LearningJul-10-2017

With the proliferation of mobile devices and the internet of things, developing principled solutions for privacy in time series applications has become increasingly important. While differential privacy is the gold standard for database privacy, many time series applications require a different kind of guarantee, and a number of recent works have used some form of inferential privacy to address these situations. However, a major barrier to using inferential privacy in practice is its lack of graceful composition -- even if the same or related sensitive data is used in multiple releases that are safe individually, the combined release may have poor privacy properties. In this paper, we study composition properties of a form of inferential privacy called Pufferfish when applied to time-series data. We show that while general Pufferfish mechanisms may not compose gracefully, a specific Pufferfish mechanism, called the Markov Quilt Mechanism, which was recently introduced, has strong composition properties comparable to that of pure differential privacy when applied to time series data.

artificial intelligence, markov quilt, privacy, (15 more...)

arXiv.org Machine Learning

1707.02702

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Communications (0.54)
Information Technology > Artificial Intelligence (0.50)
Information Technology > Security & Privacy (0.48)

Add feedback