AITopics | Federmann, Christian

Collaborating Authors

Federmann, Christian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies

Kocmi, Tom, Zouhar, Vilém, Federmann, Christian, Post, Matt

arXiv.org Artificial IntelligenceJan-12-2024

Ten years ago a single metric, BLEU, governed progress in machine translation research. For better or worse, there is no such consensus today, and consequently it is difficult for researchers to develop and retain the kinds of heuristic intuitions about metric deltas that drove earlier research and deployment decisions. This paper investigates the "dynamic range" of a number of modern metrics in an effort to provide a collective understanding of the meaning of differences in scores both within and among metrics; in other words, we ask what point difference X in metric Y is required between two systems for humans to notice? We conduct our evaluation on a new large dataset, ToShip23, using it to discover deltas at which metrics achieve system-level differences that are meaningful to humans, which we measure by pairwise system accuracy. We additionally show that this method of establishing delta-accuracy is more stable than the standard use of statistical p-values in regards to testset size. Where data size permits, we also explore the effect of metric deltas and accuracy across finer-grained features such as translation direction, domain, and system closeness.

accuracy, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.0676

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4

Kocmi, Tom, Federmann, Christian

arXiv.org Artificial IntelligenceOct-21-2023

This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect translation quality errors, specifically for the quality estimation setting without the need for human reference translations. Based on the power of large language models (LLM), GEMBA-MQM employs a fixed three-shot prompting technique, querying the GPT-4 model to mark error quality spans. Compared to previous works, our method has language-agnostic prompts, thus avoiding the need for manual prompt preparation for new languages. While preliminary results indicate that GEMBA-MQM achieves state-of-the-art accuracy for system ranking, we advise caution when using it in academic works to demonstrate improvements over other methods due to its dependence on the proprietary, black-box GPT model.

large language model, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2310.13988

Country:

Europe (1.00)
North America > United States (0.28)
Asia > Middle East > UAE (0.15)

Genre: Research Report (1.00)

Industry: Transportation (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

Kocmi, Tom, Federmann, Christian

arXiv.org Artificial IntelligenceMay-31-2023

We describe GEMBA, a GPT-based metric for assessment of translation quality, which works both with a reference translation and without. In our evaluation, we focus on zero-shot prompting, comparing four prompt variants in two modes, based on the availability of the reference. We investigate nine versions of GPT models, including ChatGPT and GPT-4. We show that our method for translation quality assessment only works with GPT~3.5 and larger models. Comparing to results from WMT22's Metrics shared task, our method achieves state-of-the-art accuracy in both modes when compared to MQM-based human labels. Our results are valid on the system level for all three WMT22 Metrics shared task language pairs, namely English into German, English into Russian, and Chinese into English. This provides a first glimpse into the usefulness of pre-trained, generative large language models for quality assessment of translations. We publicly release all our code and prompt templates used for the experiments described in this work, as well as all corresponding scoring results, to allow for external validation and reproducibility.

artificial intelligence, natural language, translation, (14 more...)

arXiv.org Artificial Intelligence

2302.1452

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Maryland (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Engine Machine Translation as a Lifelong Machine Learning Problem

Federmann, Christian (German Research Center for Artificial Intelligence)

AAAI ConferencesMar-21-2013

We describe an approach for multi-engine machine translation that uses machine learning methods to train one or several classifiers for a given set of candidate translations. Contrary to existing approaches in quality estimation which only consider a single translation at a time, we explicitly model pairwise comparison with our feature vectors. We discuss several challenges our method is facing and discuss how lifelong machine learning could be applied to resolve these. We also show how the proposed architecture can be extended to allow human feedback to be included into the training process, improving the system's selection process over time.

lifelong machine learning problem, multi-engine machine translation

AAAI Conferences

2013 AAAI Spring Symposium Series

Industry: Education > Focused Education > Special Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Add feedback