AITopics | incorrect translation

Collaborating Authors

incorrect translation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets

Moghe, Nikita, Fazla, Arnisa, Amrhein, Chantal, Kocmi, Tom, Steedman, Mark, Birch, Alexandra, Sennrich, Rico, Guillou, Liane

arXiv.org Artificial IntelligenceJan-29-2024

Recent machine translation (MT) metrics calibrate their effectiveness by correlating with human judgement but without any insights about their behaviour across different error types. Challenge sets are used to probe specific dimensions of metric behaviour but there are very few such datasets and they either focus on a limited number of phenomena or a limited number of language pairs. We introduce ACES, a contrastive challenge set spanning 146 language pairs, aimed at discovering whether metrics can identify 68 translation accuracy errors. These phenomena range from simple alterations at the word/character level to more complex errors based on discourse and real-world knowledge. We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks. We benchmark metric performance, assess their incremental performance over successive campaigns, and measure their sensitivity to a range of linguistic phenomena. We also investigate claims that Large Language Models (LLMs) are effective as MT evaluators by evaluating on ACES. Our results demonstrate that different metric families struggle with different phenomena and that LLM-based methods fail to demonstrate reliable performance. Our analyses indicate that most metrics ignore the source sentence, tend to prefer surface-level overlap and end up incorporating properties of base models which are not always beneficial. We expand ACES to include error span annotations, denoted as SPAN-ACES and we use this dataset to evaluate span-based error metrics showing these metrics also need considerable improvement. Finally, we provide a set of recommendations for building better MT metrics, including focusing on error labels instead of scores, ensembling, designing strategies to explicitly focus on the source sentence, focusing on semantic content and choosing the right base model for representations.

computational linguistic, incorrect translation, translation, (14 more...)

arXiv.org Artificial Intelligence

2401.16313

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(26 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment (0.67)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

An approach for mistranslation removal from popular dataset for Indic MT Task

Das, Sudhansu Bala, Rodrigues, Leo Raphael, Mishra, Tapas Kumar, Patra, Bidyut Kr.

arXiv.org Artificial IntelligenceJan-12-2024

The conversion of content from one language to another utilizing a computer system is known as Machine Translation (MT). Various techniques have come up to ensure effective translations that retain the contextual and lexical interpretation of the source language. End-to-end Neural Machine Translation (NMT) is a popular technique and it is now widely used in real-world MT systems. Massive amounts of parallel datasets (sentences in one language alongside translations in another) are required for MT systems. These datasets are crucial for an MT system to learn linguistic structures and patterns of both languages during the training phase. One such dataset is Samanantar, the largest publicly accessible parallel dataset for Indian languages (ILs). Since the corpus has been gathered from various sources, it contains many incorrect translations. Hence, the MT systems built using this dataset cannot perform to their usual potential. In this paper, we propose an algorithm to remove mistranslations from the training corpus and evaluate its performance and efficiency. Two Indic languages (ILs), namely, Hindi (HIN) and Odia (ODI) are chosen for the experiment. A baseline NMT system is built for these two ILs, and the effect of different dataset sizes is also investigated. The quality of the translations in the experiment is evaluated using standard metrics such as BLEU, METEOR, and RIBES. From the results, it is observed that removing the incorrect translation from the dataset makes the translation quality better. It is also noticed that, despite the fact that the ILs-English and English-ILs systems are trained using the same corpus, ILs-English works more effectively across all the evaluation metrics.

dataset, machine translation, translation, (14 more...)

arXiv.org Artificial Intelligence

2401.06398

Country:

Asia > India > Odisha (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Uttar Pradesh (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

ACES: Translation Accuracy Challenge Sets at WMT 2023

Amrhein, Chantal, Moghe, Nikita, Guillou, Liane

arXiv.org Artificial IntelligenceNov-2-2023

We benchmark the performance of segmentlevel metrics submitted to WMT 2023 using the ACES Challenge Set (Amrhein et al., 2022). The challenge set consists of 36K examples representing challenges from 68 phenomena and covering 146 language pairs. The phenomena range from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. For each metric, we provide a detailed profile of performance over a range of error categories as well as an overall ACES-Score for quick comparison. We also measure the incremental performance of the metrics submitted to both WMT 2023 and 2022. We find that 1) there is no clear winner among the metrics submitted to WMT 2023, and 2) performance change between the 2023 and 2022 versions of the metrics is highly variable. Our recommendations are similar to those from WMT 2022. Metric developers should focus on: building ensembles of metrics from different design families, developing metrics that pay more attention to the source and rely less on surface-level overlap, and carefully determining the influence of multilingual embeddings on MT evaluation.

category, metric, translation, (14 more...)

arXiv.org Artificial Intelligence

2311.01153

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
North America > United States > Texas (0.04)
(12 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback

Xu, Wenda, Wang, Danqing, Pan, Liangming, Song, Zhenqiao, Freitag, Markus, Wang, William Yang, Li, Lei

arXiv.org Artificial IntelligenceOct-26-2023

Automatically evaluating the quality of language generation is critical. Although recent learned metrics show high correlation with human judgement, these metrics can not explain their verdict or associate the scores with defects in generated text. To address this limitation, we present InstructScore, an explainable evaluation metric for text generation. By harnessing both explicit human instruction and the implicit knowledge of GPT-4, we fine-tune a text evaluation metric based on LLaMA, producing both a score for generated text and a human readable diagnostic report. We evaluate InstructScore on a variety of generation tasks, including translation, captioning, data-to-text and commonsense generation. Experiments show that our 7B model surpasses all other unsupervised metrics, including those based on 175B GPT-3 and GPT-4. Surprisingly, our InstructScore, even without direct supervision from human-rated data, achieves performance levels on par with state-of-the-art metrics like COMET22, which were fine-tuned on human ratings.

explanation, major minor, translation, (12 more...)

arXiv.org Artificial Intelligence

2305.14282

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Europe > Switzerland (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(11 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology (1.00)
Education (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics

Amrhein, Chantal, Moghe, Nikita, Guillou, Liane

arXiv.org Artificial IntelligenceDec-6-2022

As machine translation (MT) metrics improve their correlation with human judgement every year, it is crucial to understand the limitations of such metrics at the segment level. Specifically, it is important to investigate metric behaviour when facing accuracy errors in MT because these can have dangerous consequences in certain contexts (e.g., legal, medical). We curate ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. We use ACES to evaluate a wide range of MT metrics including the submissions to the WMT 2022 metrics shared task and perform several analyses leading to general recommendations for metric developers. We recommend: a) combining metrics with different strengths, b) developing metrics that give more weight to the source and less to surface-level overlap with the reference and c) explicitly modelling additional language-specific information beyond what is available via multilingual embeddings.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2210.15615

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(24 more...)

Genre: Research Report (0.81)

Industry: Law (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback