AITopics

Large language models (LLMs) have emerged as strong contenders in machine translation.Yet, they still struggle to adequately handle discourse phenomena, such as pronoun resolution and lexical cohesion at the document level. In this study, we thoroughly investigate the discourse phenomena performance of LLMs in context-aware translation. We demonstrate that discourse knowledge is encoded within LLMs and propose the use of quality-aware decoding (QAD) to effectively extract this knowledge, showcasing its superiority over other decoding approaches through comprehensive analysis. Furthermore, we illustrate that QAD enhances the semantic richness of translations and aligns them more closely with human preferences.

computational linguistic, large language model, natural language, (14 more...)

2510.06866

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Filandrianos, Giorgos, Mastromichalakis, Orfeas Menis, Mohammed, Wafaa, Attanasio, Giuseppe, Zerva, Chrysoula

GAMBIT+: A Challenge Set for Evaluating Gender Bias in Machine Translation Quality Estimation Metrics

Gender bias in machine translation (MT) systems has been extensively documented, but bias in automatic quality estimation (QE) metrics remains comparatively underexplored. Existing studies suggest that QE metrics can also exhibit gender bias, yet most analyses are limited by small datasets, narrow occupational coverage, and restricted language variety. To address this gap, we introduce a large-scale challenge set specifically designed to probe the behavior of QE metrics when evaluating translations containing gender-ambiguous occupational terms. Building on the GAMBIT corpus of English texts with gender-ambiguous occupations, we extend coverage to three source languages that are genderless or natural-gendered, and eleven target languages with grammatical gender, resulting in 33 source-target language pairs. Each source text is paired with two target versions differing only in the grammatical gender of the occupational term(s) (masculine vs. feminine), with all dependent grammatical elements adjusted accordingly. An unbiased QE metric should assign equal or near-equal scores to both versions. The dataset's scale, breadth, and fully parallel design, where the same set of texts is aligned across all languages, enables fine-grained bias analysis by occupation and systematic comparisons across languages.

artificial intelligence, machine translation, natural language, (14 more...)

2510.06841

Country:

Europe (1.00)
North America > United States (0.68)
Asia > Middle East > UAE (0.14)

Genre: Research Report > Experimental Study (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Xiong, Jiafeng, Zhao, Yuting

GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation

Multimodal Machine Translation (MMT) has demonstrated the significant help of visual information in machine translation. However, existing MMT methods face challenges in leveraging the modality gap by enforcing rigid visual-linguistic alignment whilst being confined to inference within their trained multimodal domains. In this work, we construct novel multimodal scene graphs to preserve and integrate modality-specific information and introduce GIIFT, a two-stage Graph-guided Inductive Image-Free MMT framework that uses a cross-modal Graph Attention Network adapter to learn multimodal knowledge in a unified fused space and inductively generalize it to broader image-free translation domains. Experimental results on the Multi30K dataset of English-to-French and English-to-German tasks demonstrate that our GIIFT surpasses existing approaches and achieves the state-of-the-art, even without images during inference. Results on the WMT benchmark show significant improvements over the image-free translation baselines, demonstrating the strength of GIIFT towards inductive image-free inference.

artificial intelligence, machine translation, natural language, (12 more...)

2507.18562

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report (0.64)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models

Luo, Hengyu, Li, Zihao, Attieh, Joseph, Devkota, Sawal, de Gibert, Ona, Huang, Xu, Ji, Shaoxiong, Lin, Peiqin, Mantina, Bhavani Sai Praneeth Varma, Sreenidhi, Ananda, Vázquez, Raúl, Wang, Mengjie, Yusofi, Samea, Yuan, Fei, Tiedemann, Jörg

Large language models (LLMs) are advancing at an unprecedented pace globally, with regions increasingly adopting these models for applications in their primary language. Evaluation of these models in diverse linguistic environments, especially in low-resource languages, has become a major challenge for academia and industry. Existing evaluation frameworks are disproportionately focused on English and a handful of high-resource languages, thereby overlooking the realistic performance of LLMs in multilingual and lower-resource scenarios. To address this gap, we introduce GlotEval, a lightweight framework designed for massively multilingual evaluation. Supporting seven key tasks (machine translation, text classification, summarization, open-ended generation, reading comprehension, sequence labeling, and intrinsic evaluation), spanning over dozens to hundreds of languages, GlotEval highlights consistent multilingual benchmarking, language-specific prompt templates, and non-English-centric machine translation. This enables a precise diagnosis of model strengths and weaknesses in diverse linguistic contexts. A multilingual translation case study demonstrates GlotEval's applicability for multilingual and language-specific evaluations.

large language model, machine learning, natural language, (14 more...)

2504.04155

Country:

Asia > China (0.46)
North America > United States (0.29)
Europe > Finland (0.28)
Europe > Germany (0.28)

Genre: Research Report (0.82)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsOct-8-2025, 22:15:34 GMT

7571c9d44179c7988178593c5b62a9b6-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (18 more...)

Country:

Africa > Sudan (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
(13 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Neural Information Processing SystemsOct-8-2025, 22:13:22 GMT

74bb24dca8334adce292883b4b651eda-Paper-Conference.pdf

large language model, machine learning, natural language, (17 more...)

Country:

North America > Haiti (0.14)
Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(38 more...)

Genre: Overview (0.46)

Technology:

Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
(2 more...)

Neural Information Processing SystemsOct-8-2025, 20:46:20 GMT

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces Peter Shaw

Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available.

demonstration, machine learning, reinforcement learning, (21 more...)

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
(4 more...)

Neural Information Processing SystemsOct-8-2025, 20:27:36 GMT

On the Pareto Front of Multilingual Neural Machine Translation Liang Chen 1 Shuming Ma

In this work, we study how the performance of a given direction changes with its sampling ratio in Multilingual Neural Machine Translation (MNMT). By training over 200 multilingual models with various model sizes, data sizes, and language directions, we find it interesting that the performance of certain translation direction does not always improve with the increase of its weight in the multi-task optimization objective. Accordingly, scalarization method leads to a multitask trade-off front that deviates from the traditional Pareto front when there exists data imbalance in the training corpus, which poses a great challenge to improve the overall performance of all directions. Based on our observations, we propose the Double Power Law to predict the unique performance trade-off front in MNMT, which is robust across various languages, data adequacy, and the number of tasks. Finally, we formulate the sample ratio selection problem in MNMT as an optimization problem based on the Double Power Law. In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget.

artificial intelligence, low-resource direction, natural language, (14 more...)