AITopics | mistranslation

Collaborating Authors

mistranslation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DeepSeek-R1 vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?

Larionov, Daniil, Takeshita, Sotaro, Zhang, Ran, Chen, Yanran, Leiter, Christoph, Wang, Zhipin, Greisinger, Christian, Eger, Steffen

arXiv.org Artificial IntelligenceJun-2-2025

Reasoning-enabled large language models (LLMs) excel in logical tasks, yet their utility for evaluating natural language generation remains unexplored. This study systematically compares reasoning LLMs with non-reasoning counterparts across machine translation and text summarization evaluation tasks. We evaluate eight models spanning state-of-the-art reasoning models (DeepSeek-R1, OpenAI o3), their distilled variants (8B-70B parameters), and equivalent non-reasoning LLMs. Experiments on WMT23 and SummEval benchmarks reveal architecture and task-dependent benefits: OpenAI o3-mini models show improved performance with increased reasoning on MT, while DeepSeek-R1 and generally underperforms compared to its non-reasoning variant except in summarization consistency evaluation. Correlation analysis demonstrates that reasoning token usage correlates with evaluation quality only in specific models, while almost all models generally allocate more reasoning tokens when identifying more quality issues. Distillation maintains reasonable performance up to 32B parameter models but degrades substantially at 8B scale. This work provides the first assessment of reasoning LLMs for NLG evaluation and comparison to non-reasoning models. We share our code to facilitate further research: https://github.com/NL2G/reasoning-eval.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.0812

Country:

Asia (0.67)
Europe (0.46)
North America > United States (0.28)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Using Source-Side Confidence Estimation for Reliable Translation into Unfamiliar Languages

Sible, Kenneth J., Chiang, David

arXiv.org Artificial IntelligenceMar-30-2025

We present an interactive machine translation (MT) system designed for users who are not proficient in the target language. It aims to improve trustworthiness and explainability by identifying potentially mistranslated words and allowing the user to intervene to correct mistranslations. However, confidence estimation in machine translation has traditionally focused on the target side. Whereas the conventional approach to source-side confidence estimation would have been to project target word probabilities to the source side via word alignments, we propose a direct, alignment-free approach that measures how sensitive the target word probabilities are to changes in the source embeddings. Experimental results show that our method outperforms traditional alignment-based methods at detection of mistranslations.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.23305

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Greenland (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators

Lu, Qingyu, Ding, Liang, Zhang, Kanjian, Zhang, Jinxia, Tao, Dacheng

arXiv.org Artificial IntelligenceDec-16-2024

Large Language Models (LLMs) have shown significant potential as judges for Machine Translation (MT) quality assessment, providing both scores and fine-grained feedback. Although approaches such as GEMBA-MQM have shown state-of-the-art performance on reference-free evaluation, the predicted errors do not align well with those annotated by human, limiting their interpretability as feedback signals. To enhance the quality of error annotations predicted by LLM evaluators, we introduce a universal and training-free framework, $\textbf{MQM-APE}$, based on the idea of filtering out non-impactful errors by Automatically Post-Editing (APE) the original translation based on each error, leaving only those errors that contribute to quality improvement. Specifically, we prompt the LLM to act as 1) $\textit{evaluator}$ to provide error annotations, 2) $\textit{post-editor}$ to determine whether errors impact quality improvement and 3) $\textit{pairwise quality verifier}$ as the error filter. Experiments show that our approach consistently improves both the reliability and quality of error spans against GEMBA-MQM, across eight LLMs in both high- and low-resource languages. Orthogonal to trained approaches, MQM-APE complements translation-specific evaluators such as Tower, highlighting its broad applicability. Further analysis confirms the effectiveness of each module and offers valuable insights into evaluator design and LLMs selection.

large language model, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2409.14335

Country:

Asia > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

"Is Hate Lost in Translation?": Evaluation of Multilingual LGBTQIA+ Hate Speech Detection

Chan, Fai Leui, Nguyen, Duke, Joshi, Aditya

arXiv.org Artificial IntelligenceOct-23-2024

This paper explores the challenges of detecting LGBTQIA+ hate speech of large language models across multiple languages, including English, Italian, Chinese and (code-switched) English-Tamil, examining the impact of machine translation and whether the nuances of hate speech are preserved across translation. We examine the hate speech detection ability of zero-shot and fine-tuned GPT. Our findings indicate that: (1) English has the highest performance and the code-switching scenario of English-Tamil being the lowest, (2) fine-tuning improves performance consistently across languages whilst translation yields mixed results. Through simple experimentation with original text and machine-translated text for hate speech detection along with a qualitative error analysis, this paper sheds light on the socio-cultural nuances and complexities of languages that may not be captured by automatic translation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.1123

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Virginia (0.04)
North America > Canada > Ontario > Toronto (0.04)
(6 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

MQM-Chat: Multidimensional Quality Metrics for Chat Translation

Li, Yunmeng, Suzuki, Jun, Morishita, Makoto, Abe, Kaori, Inui, Kentaro

arXiv.org Artificial IntelligenceAug-29-2024

The complexities of chats pose significant challenges for machine translation models. Recognizing the need for a precise evaluation metric to address the issues of chat translation, this study introduces Multidimensional Quality Metrics for Chat Translation (MQM-Chat). Through the experiments of five models using MQM-Chat, we observed that all models generated certain fundamental errors, while each of them has different shortcomings, such as omission, overly correcting ambiguous source content, and buzzword issues, resulting in the loss of stylized information. Our findings underscore the effectiveness of MQM-Chat in evaluating chat translation, emphasizing the importance of stylized content and dialogue consistency for future studies.

chat translation, computational linguistic, translation, (11 more...)

arXiv.org Artificial Intelligence

2408.1639

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
Europe > Belgium (0.04)
Asia > Japan > Honshū > Tōhoku (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)

Add feedback

An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication

Li, Yunmeng, Suzuki, Jun, Morishita, Makoto, Abe, Kaori, Inui, Kentaro

arXiv.org Artificial IntelligenceAug-28-2024

participant, translation, warning message, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.ijcnlp-srw.2

2408.15543

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Belgium (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(6 more...)

Genre: Research Report > New Finding (0.86)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem

Court, Sara, Elsner, Micha

arXiv.org Artificial IntelligenceJun-21-2024

This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline. We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of information retrieved from a constrained database of digitized pedagogical materials (dictionaries and grammar lessons) and parallel corpora. Using both automatic and human evaluation of model output, we conduct ablation studies that manipulate (1) context type (morpheme translations, grammar descriptions, and corpus examples), (2) retrieval methods (automated vs. manual), and (3) model type. Our results suggest that even relatively small LLMs are capable of utilizing prompt context for zero-shot low-resource translation when provided a minimally sufficient amount of relevant linguistic information. However, the variable effects of prompt type, retrieval method, model type, and language-specific factors highlight the limitations of using even the best LLMs as translation systems for the majority of the world's 7,000+ languages and their speakers.

computational linguistic, information, translation, (15 more...)

arXiv.org Artificial Intelligence

2406.15625

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > United States > Ohio (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cyber Risks of Machine Translation Critical Errors : Arabic Mental Health Tweets as a Case Study

Saadany, Hadeel, Tantawy, Ashraf, Orasan, Constantin

arXiv.org Artificial IntelligenceMay-19-2024

With the advent of Neural Machine Translation (NMT) systems, the MT output has reached unprecedented accuracy levels which resulted in the ubiquity of MT tools on almost all online platforms with multilingual content. However, NMT systems, like other state-of-the-art AI generative systems, are prone to errors that are deemed machine hallucinations. The problem with NMT hallucinations is that they are remarkably \textit{fluent} hallucinations. Since they are trained to produce grammatically correct utterances, NMT systems are capable of producing mistranslations that are too fluent to be recognised by both users of the MT tool, as well as by automatic quality metrics that are used to gauge their performance. In this paper, we introduce an authentic dataset of machine translation critical errors to point to the ethical and safety issues involved in the common use of MT. The dataset comprises mistranslations of Arabic mental health postings manually annotated with critical error types. We also show how the commonly used quality metrics do not penalise critical errors and highlight this as a critical issue that merits further attention from researchers.

critical error, mistranslation, translation, (15 more...)

arXiv.org Artificial Intelligence

2405.11668

Country:

North America > United States > New York (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > United Kingdom > England > Surrey (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

An approach for mistranslation removal from popular dataset for Indic MT Task

Das, Sudhansu Bala, Rodrigues, Leo Raphael, Mishra, Tapas Kumar, Patra, Bidyut Kr.

arXiv.org Artificial IntelligenceJan-12-2024

The conversion of content from one language to another utilizing a computer system is known as Machine Translation (MT). Various techniques have come up to ensure effective translations that retain the contextual and lexical interpretation of the source language. End-to-end Neural Machine Translation (NMT) is a popular technique and it is now widely used in real-world MT systems. Massive amounts of parallel datasets (sentences in one language alongside translations in another) are required for MT systems. These datasets are crucial for an MT system to learn linguistic structures and patterns of both languages during the training phase. One such dataset is Samanantar, the largest publicly accessible parallel dataset for Indian languages (ILs). Since the corpus has been gathered from various sources, it contains many incorrect translations. Hence, the MT systems built using this dataset cannot perform to their usual potential. In this paper, we propose an algorithm to remove mistranslations from the training corpus and evaluate its performance and efficiency. Two Indic languages (ILs), namely, Hindi (HIN) and Odia (ODI) are chosen for the experiment. A baseline NMT system is built for these two ILs, and the effect of different dataset sizes is also investigated. The quality of the translations in the experiment is evaluated using standard metrics such as BLEU, METEOR, and RIBES. From the results, it is observed that removing the incorrect translation from the dataset makes the translation quality better. It is also noticed that, despite the fact that the ILs-English and English-ILs systems are trained using the same corpus, ILs-English works more effectively across all the evaluation metrics.

dataset, machine translation, translation, (14 more...)

arXiv.org Artificial Intelligence

2401.06398

Country:

Asia > India > Odisha (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Uttar Pradesh (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Explaining CLIP through Co-Creative Drawings and Interaction

Guljajeva, Varvara, Solà, Mar Canet, Clarke, Isaac Joseph

arXiv.org Artificial IntelligenceJun-12-2023

This paper analyses a visual archive of drawings produced by an interactive robotic art installation where audience members narrated their dreams into a system powered by CLIPdraw deep learning (DL) model that interpreted and transformed their dreams into images. The resulting archive of prompt-image pairs were examined and clustered based on concept representation accuracy. As a result of the analysis, the paper proposes four groupings for describing and explaining CLIP-generated results: clear concept, text-to-text as image, indeterminacy and confusion, and lost in translation. This article offers a glimpse into a collection of dreams interpreted, mediated and given form by Artificial Intelligence (AI), showcasing oftentimes unexpected, visually compelling or, indeed, the dream-like output of the system, with the emphasis on processes and results of translations between languages, sign-systems and various modules of the installation. In the end, the paper argues that proposed clusters support better understanding of the neural model.

machine learning, natural language, translation, (20 more...)

arXiv.org Artificial Intelligence

2306.07429

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > New York > Nassau County > Mineola (0.04)
North America > United States > Indiana (0.04)
(4 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback