AITopics | human translation

Collaborating Authors

human translation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data Kernel Perspective Space Performance Guarantees for Synthetic Data from Transformer Models

Browder, Michael, Duh, Kevin, Harris, J. David, Lyzinski, Vince, McNamee, Paul, Park, Youngser, Priebe, Carey E., Viechnicki, Peter

arXiv.org Machine LearningFeb-6-2026

Scarcity of labeled training data remains the long pole in the tent for building performant language technology and generative AI models. Transformer models -- particularly LLMs -- are increasingly being used to mitigate the data scarcity problem via synthetic data generation. However, because the models are black boxes, the properties of the synthetic data are difficult to predict. In practice it is common for language technology engineers to 'fiddle' with the LLM temperature setting and hope that what comes out the other end improves the downstream model. Faced with this uncertainty, here we propose Data Kernel Perspective Space (DKPS) to provide the foundation for mathematical analysis yielding concrete statistical guarantees for the quality of the outputs of transformer models. We first show the mathematical derivation of DKPS and how it provides performance guarantees. Next we show how DKPS performance guarantees can elucidate performance of a downstream task, such as neural machine translation models or LLMs trained using Contrastive Preference Optimization (CPO). Limitations of the current work and future research are also discussed.

large language model, machine learning, translation, (20 more...)

arXiv.org Machine Learning

2602.05106

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(9 more...)

Genre: Research Report (0.50)

Industry:

Education > Educational Setting > Higher Education (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Missing the human touch? A computational stylometry analysis of GPT-4 translations of online Chinese literature

Yao, Xiaofang, Kang, Yong-Bin, McCosker, Anthony

arXiv.org Artificial IntelligenceNov-26-2025

Existing research suggests that machine translations of literary texts remain unsatisfactory. Such quality assessment often relies on automated metrics and subjective human ratings, with little attention to the stylistic features of machine translation. Empirical evidence is also scant on whether the advent of AI will transform the literary translation landscape, with implications for other critical domains for translation such as creative industries more broadly. This pioneering study investigates the stylistic features of AI translations, specifically examining GPT -4's performance against human translations in a Chinese online literature task. Our computational stylometry analysis reveals that GPT -4 translations closely mirror human translations in lexical, syntactic and content features. As such, AI translations can in fact replicate the'human touch' in literary translation style. The study provides critical insights into the implications of AI on literary translation in the posthuman, where the line between machine and human translations may become increasingly blurry.

large language model, machine learning, translation, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1075/ts.24043.yao

2506.13013

Country:

Asia (0.46)
North America > United States (0.28)
Europe > Finland (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LiTransProQA: an LLM-based Literary Translation evaluation metric with Professional Question Answering

Zhang, Ran, Zhao, Wei, Macken, Lieve, Eger, Steffen

arXiv.org Artificial IntelligenceOct-14-2025

The impact of Large Language Models (LLMs) has extended into literary domains. However, existing evaluation metrics for literature prioritize mechanical accuracy over artistic expression and tend to overrate machine translation as being superior to human translation from experienced professionals. In the long run, this bias could result in an irreversible decline in translation quality and cultural authenticity. In response to the urgent need for a specialized literary evaluation metric, we introduce LITRANSPROQA, a novel, reference-free, LLM-based question-answering framework designed for literary translation evaluation. LITRANSPROQA integrates humans in the loop to incorporate insights from professional literary translators and researchers, focusing on critical elements in literary quality assessment such as literary devices, cultural understanding, and authorial voice. Our extensive evaluation shows that while literary-finetuned XCOMET-XL yields marginal gains, LITRANSPROQA substantially outperforms current metrics, achieving up to 0.07 gain in correlation and surpassing the best state-of-the-art metrics by over 15 points in adequacy assessments. Incorporating professional translator insights as weights further improves performance, highlighting the value of translator inputs. Notably, LITRANSPROQA reaches an adequacy performance comparable to trained linguistic student evaluators, though it still falls behind experienced professional translators. LITRANSPROQA shows broad applicability to open-source models like LLaMA3.3-70b and Qwen2.5-32b, indicating its potential as an accessible and training-free tool for evaluating literary translations that require local processing due to copyright or ethical considerations.

large language model, machine learning, translation, (19 more...)

arXiv.org Artificial Intelligence

2505.05423

Country:

North America > United States (1.00)
Europe (0.92)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Contextual effects of sentiment deployment in human and machine translation

Comstock, Lindy, Sharma, Priyanshu, Belov, Mikhail

arXiv.org Artificial IntelligenceFeb-25-2025

This paper illustrates how the overall sentiment of a text may be shifted in translation and the implications for automated sentiment analyses, particularly those that utilize machine translation and assess findings via semantic similarity metrics. While human and machine translation will produce more lemmas that fit the expected frequency of sentiment in the target language, only machine translation will also reduce the overall semantic field of the text, particularly in regard to words with epistemic content.

lemma, machine translation, translation, (13 more...)

arXiv.org Artificial Intelligence

2502.18642

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.31)
Asia > Russia (0.29)
North America > United States > Indiana (0.04)

Genre: Research Report (0.83)

Industry: Government > Regional Government (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects

Deutsch, Daniel, Briakou, Eleftheria, Caswell, Isaac, Finkelstein, Mara, Galor, Rebecca, Juraska, Juraj, Kovacs, Geza, Lui, Alison, Rei, Ricardo, Riesa, Jason, Rijhwani, Shruti, Riley, Parker, Salesky, Elizabeth, Trabelsi, Firas, Winkler, Stephanie, Zhang, Biao, Freitag, Markus

arXiv.org Artificial IntelligenceFeb-17-2025

As large language models (LLM) become more and more capable in languages other than English, it is important to collect benchmark datasets in order to evaluate their multilingual performance, including on tasks like machine translation (MT). In this work, we extend the WMT24 dataset to cover 55 languages by collecting new human-written references and post-edits for 46 new languages and dialects in addition to post-edits of the references in 8 out of 9 languages in the original WMT24 dataset. The dataset covers four domains: literary, news, social, and speech. We benchmark a variety of MT providers and LLMs on the collected dataset using automatic metrics and find that LLMs are the best-performing MT systems in all 55 languages. These results should be confirmed using a human-based evaluation, which we leave for future work.

large language model, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2502.12404

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.93)
Africa (0.68)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Characterizing the Effects of Translation on Intertextuality using Multilingual Embedding Spaces

McGovern, Hope, Sirin, Hale, Lippincott, Tom

arXiv.org Artificial IntelligenceJan-18-2025

Rhetorical devices are difficult to translate, but they are crucial to the translation of literary documents. We investigate the use of multilingual embedding spaces to characterize the preservation of intertextuality, one common rhetorical device, across human and machine translation. To do so, we use Biblical texts, which are both full of intertextual references and are highly translated works. We provide a metric to characterize intertextuality at the corpus level and provide a quantitative analysis of the preservation of this rhetorical device across extant human translations and machine-generated counterparts. We go on to provide qualitative analysis of cases wherein human translations over- or underemphasize the intertextuality present in the text, whereas machine translations provide a neutral baseline. This provides support for established scholarship proposing that human translators have a propensity to amplify certain literary characteristics of the original manuscripts.

artificial intelligence, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2501.10731

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York (0.04)
(11 more...)

Genre: Research Report (0.65)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls

Shafayat, Sheikh, Yoon, Dongkeun, Jang, Woori, Choi, Jiwoo, Oh, Alice, Jung, Seohyon

arXiv.org Artificial IntelligenceJan-1-2025

In this work, we propose and evaluate the feasibility of a two-stage pipeline to evaluate literary machine translation, in a fine-grained manner, from English to Korean. The results show that our framework provides fine-grained, interpretable metrics suited for literary translation and obtains a higher correlation with human judgment than traditional machine translation metrics. Nonetheless, it still fails to match interhuman agreement, especially in metrics like Korean Honorifics. We also observe that LLMs tend to favor translations generated by other LLMs, and we highlight the necessity of developing more sophisticated evaluation methods to ensure accurate and culturally sensitive machine translation of literary works. Figure 1: The overview of our proposed framework: we evaluate translation of literary works in two stages.

criteria, original text, translation, (15 more...)

arXiv.org Artificial Intelligence

2412.0134

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Virginia (0.04)
Asia > India > West Bengal > Kolkata (0.04)
(5 more...)

Genre: Research Report > New Finding (0.65)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Can ChatGPT capture swearing nuances? Evidence from translating Arabic oaths

Shormani, Mohammed Q.

arXiv.org Artificial IntelligenceDec-24-2024

This study sets out to answer one major question: Can ChatGPT capture swearing nuances? It presents an empirical study on the ability of ChatGPT to translate Arabic oath expressions into English. 30 Arabic oath expressions were collected from the literature. These 30 oaths were first translated via ChatGPT and then analyzed and compared to the human translation in terms of types of gaps left unfulfilled by ChatGPT. Specifically, the gaps involved are: religious gap, cultural gap, both religious and cultural gaps, no gap, using non-oath particles, redundancy and noncapturing of Arabic script diacritics. It concludes that ChatGPT translation of oaths is still much unsatisfactory, unveiling the need of further developments of ChatGPT, and the inclusion of Arabic data on which ChatGPT should be trained including oath expressions, oath nuances, rituals, and practices.

large language model, machine learning, translation, (20 more...)

arXiv.org Artificial Intelligence

2412.02466

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Ohio > Portage County > Kent (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels

Yan, Jianhao, Yan, Pingchuan, Chen, Yulong, Li, Jing, Zhu, Xianchao, Zhang, Yue

arXiv.org Artificial IntelligenceNov-20-2024

This study presents a comprehensive evaluation of GPT-4's translation capabilities compared to human translators of varying expertise levels. Through systematic human evaluation using the MQM schema, we assess translations across three language pairs (Chinese$\longleftrightarrow$English, Russian$\longleftrightarrow$English, and Chinese$\longleftrightarrow$Hindi) and three domains (News, Technology, and Biomedical). Our findings reveal that GPT-4 achieves performance comparable to junior-level translators in terms of total errors, while still lagging behind senior translators. Unlike traditional Neural Machine Translation systems, which show significant performance degradation in resource-poor language directions, GPT-4 maintains consistent translation quality across all evaluated language pairs. Through qualitative analysis, we identify distinctive patterns in translation approaches: GPT-4 tends toward overly literal translations and exhibits lexical inconsistency, while human translators sometimes over-interpret context and introduce hallucinations. This study represents the first systematic comparison between LLM and human translators across different proficiency levels, providing valuable insights into the current capabilities and limitations of LLM-based translation systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.13775

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > China (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs

Zhang, Ran, Zhao, Wei, Eger, Steffen

arXiv.org Artificial IntelligenceOct-24-2024

Recent research has focused on literary machine translation (MT) as a new challenge in MT. However, the evaluation of literary MT remains an open problem. We contribute to this ongoing discussion by introducing LITEVAL-CORPUS, a paragraph-level parallel corpus comprising multiple verified human translations and outputs from 9 MT systems, which totals over 2k paragraphs and includes 13k annotated sentences across four language pairs, costing 4.5k Euro. This corpus enables us to (i) examine the consistency and adequacy of multiple annotation schemes, (ii) compare evaluations by students and professionals, and (iii) assess the effectiveness of LLM-based metrics. We find that Multidimensional Quality Metrics (MQM), as the de facto standard in non-literary human MT evaluation, is inadequate for literary translation: While Best-Worst Scaling (BWS) with students and Scalar Quality Metric (SQM) with professional translators prefer human translations at rates of ~82% and ~94%, respectively, MQM with student annotators prefers human professional translations over the translations of the best-performing LLMs in only ~42% of cases. While automatic metrics generally show a moderate correlation with human MQM and SQM, they struggle to accurately identify human translations, with rates of at most ~20%. Our overall evaluation indicates that human professional translations consistently outperform LLM translations, where even the most recent LLMs tend to produce more literal and less diverse translations compared to human translations. However, newer LLMs such as GPT-4o perform substantially better than older ones.

large language model, machine learning, translation, (19 more...)

arXiv.org Artificial Intelligence

2410.18697

Country:

Asia > Singapore (0.04)
Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
(11 more...)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback