AITopics

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Neural Information Processing SystemsMay-27-2025, 11:07:03 GMT

QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation

An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware and minimum Bayes risk decoding). However, relying on a single translation with high estimated quality increases the chances of "gaming the metric''. In this paper, we address the problem of sampling a set of high-quality and diverse translations. We provide a simple and effective way to avoid over-reliance on noisy quality estimates by using them as the energy function of a Gibbs distribution.

machine translation, quality-aware metropolis-hasting sampling, translation, (2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Neural Information Processing SystemsMay-27-2025, 10:01:28 GMT

Toward Efficient Inference for Mixture of Experts

Mixture-of-Experts (MoE) models have recently gained steam in achieving the state-of-the-art performance in a wide range of tasks in computer vision and natural language processing. They effectively expand the model capacity while incurring a minimal increase in computation cost during training. However, deploying such models for inference is difficult due to their large model size and complex communication pattern. In this work, we provide a characterization of two MoE workloads, namely Language Modeling (LM) and Machine Translation (MT) and identify their sources of inefficiencies at deployment. We propose three optimization techniques to mitigate sources of inefficiencies, namely (1) Dynamic gating, (2) Expert Buffering, and (3) Expert load balancing.

efficient inference, expert buffering, mixture, (2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.99)

Neural Information Processing SystemsMay-27-2025, 03:25:56 GMT

Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms

Minimum Bayes Risk (MBR) decoding is a powerful decoding strategy widely used for text generation tasks but its quadratic computational complexity limits its practical application. This paper presents a novel approach for approximating MBR decoding using matrix completion techniques, focusing on a machine translation task. We formulate MBR decoding as a matrix completion problem, where the utility metric scores between candidate hypotheses and reference translations form a low-rank matrix. First we empirically show that the scores matrices indeed have a low-rank structure. Then we exploit this by only computing a random subset of the scores and efficiently recover the missing entries in the matrix by applying the Alternating Least Squares (ALS) algorithm, thereby enabling fast approximation of the MBR decoding process.

artificial intelligence, machine translation, natural language, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)

How Well Do Large Reasoning Models Translate? A Comprehensive Evaluation for Multi-Domain Machine Translation

Ye, Yongshi, Fu, Biao, Huang, Chongxuan, Chen, Yidong, Shi, Xiaodong

Large language models (LLMs) have demonstrated strong performance in general-purpose machine translation, but their effectiveness in complex, domain-sensitive translation tasks remains underexplored. Recent advancements in Large Reasoning Models (LRMs), raise the question of whether structured reasoning can enhance translation quality across diverse domains. In this work, we compare the performance of LRMs with traditional LLMs across 15 representative domains and four translation directions. Our evaluation considers various factors, including task difficulty, input length, and terminology density. We use a combination of automatic metrics and an enhanced MQM-based evaluation hierarchy to assess translation quality. Our findings show that LRMs consistently outperform traditional LLMs in semantically complex domains, especially in long-text and high-difficulty translation scenarios. Moreover, domain-adaptive prompting strategies further improve performance by better leveraging the reasoning capabilities of LRMs. These results highlight the potential of structured reasoning in MDMT tasks and provide valuable insights for optimizing translation systems in domain-sensitive contexts.

large language model, machine learning, natural language, (16 more...)

2505.19987

Country:

Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning

Feng, Zhaopeng, Liang, Yupu, Cao, Shaosheng, Su, Jiayuan, Ren, Jiahan, Xu, Zhe, Hu, Yao, Huang, Wenxuan, Wu, Jian, Liu, Zuozhu

Text Image Machine Translation (TIMT)-the task of translating textual content embedded in images-is critical for applications in accessibility, cross-lingual information access, and real-world document understanding. However, TIMT remains a complex challenge due to the need for accurate optical character recognition (OCR), robust visual-text reasoning, and high-quality translation, often requiring cascading multi-stage pipelines. Recent advances in large-scale Reinforcement Learning (RL) have improved reasoning in Large Language Models (LLMs) and Multimodal LLMs (MLLMs), but their application to end-to-end TIMT is still underexplored. To bridge this gap, we introduce MT$^{3}$, the first framework to apply Multi-Task RL to MLLMs for end-to-end TIMT. MT$^{3}$ adopts a multi-task optimization paradigm targeting three key sub-skills: text recognition, context-aware reasoning, and translation. It is trained using a novel multi-mixed reward mechanism that adapts rule-based RL strategies to TIMT's intricacies, offering fine-grained, non-binary feedback across tasks. Furthermore, to facilitate the evaluation of TIMT in authentic cross-cultural and real-world social media contexts, we introduced XHSPost, the first social media TIMT benchmark. Our MT$^{3}$-7B-Zero achieves state-of-the-art results on the latest in-domain MIT-10M benchmark, outperforming strong baselines such as Qwen2.5-VL-72B and InternVL2.5-78B by notable margins across multiple metrics. Additionally, the model shows strong generalization to out-of-distribution language pairs and datasets. In-depth analyses reveal how multi-task synergy, reinforcement learning initialization, curriculum design, and reward formulation contribute to advancing MLLM-driven TIMT.

large language model, machine learning, translation, (18 more...)

2505.19714

Country: Asia (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Shim, Ryan Soh-Eun, De Cristofaro, Domenico, Hu, Chengzhi Martin, Vietti, Alessandro, Plank, Barbara

Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically

Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs. Such an alignment has also been observed in speech foundation models. However, it remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech. Building on prior work on spoken translation retrieval, we perform pronunciation-controlled experiments to observe if cross-lingual alignment can indeed occur in such models on a semantic basis, instead of relying on phonetic similarities. Our findings indicate that even in the absence of phonetic cues, spoken translation retrieval accuracy remains relatively stable. We follow up with a controlled experiment on a word-level dataset of cross-lingual synonyms and near-homophones, confirming the existence of both phonetic and semantic knowledge in the encoder. Finally, we qualitatively examine the transcriptions produced by early exiting the encoder, where we observe that speech translation produces semantic errors that are characterized by phonetic similarities to corresponding words in the source language. We apply this insight from early exiting to speech recognition in seven low-resource languages unsupported by the Whisper model, and achieve improved accuracy in all languages examined, particularly for languages with transparent orthographies.

artificial intelligence, natural language, text processing, (18 more...)

2505.19606

Country:

Europe (0.93)
North America > Mexico (0.28)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.90)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Shetty, Ahan Prasannakumar

Evaluating Machine Translation Models for English-Hindi Language Pairs: A Comparative Analysis

--Machine translation has become a critical tool in bridging linguistic gaps, especially between languages as diverse as English and Hindi. This paper comprehensively evaluates various machine translation models for translating between English and Hindi. We assess the performance of these models using a diverse set of automatic evaluation metrics, both lexical and machine learning-based metrics. The study aims to provide insights into the effectiveness of different machine translation approaches in handling both general and specialized language domains. Results indicate varying performance levels across different metrics, highlighting strengths and areas for improvement in current translation systems.

artificial intelligence, natural language, translation, (13 more...)

2505.19604

Country: Asia > India (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.47)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Multimodal Machine Translation with Visual Scene Graph Pruning

Lu, Chenyu, Sun, Shiliang, Zhao, Jing, Zhang, Nan, Song, Tengfei, Yang, Hao

Multimodal machine translation (MMT) seeks to address the challenges posed by linguistic polysemy and ambiguity in translation tasks by incorporating visual information. A key bottleneck in current MMT research is the effective utilization of visual data. Previous approaches have focused on extracting global or region-level image features and using attention or gating mechanisms for multimodal information fusion. However, these methods have not adequately tackled the issue of visual information redundancy in MMT, nor have they proposed effective solutions. In this paper, we introduce a novel approach--multimodal machine translation with visual Scene Graph Pruning (PSG), which leverages language scene graph information to guide the pruning of redundant nodes in visual scene graphs, thereby reducing noise in downstream translation tasks. Through extensive comparative experiments with state-of-the-art methods and ablation studies, we demonstrate the effectiveness of the PSG model. Our results also highlight the promising potential of visual information pruning in advancing the field of MMT.

artificial intelligence, natural language, translation, (14 more...)

2505.19507

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.54)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Jagadeeshan, Manoj Balaji, Raj, Prince, Goyal, Pawan

Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval On English Queries and Sanskrit Documents

The study presents a comprehensive benchmark for retrieving Sanskrit documents using English queries, focusing on the chapters of the Srimadbhagavatam. It employs a tripartite approach: Direct Retrieval (DR), Translation-based Retrieval (DT), and Query Translation (QT), utilizing shared embedding spaces and advanced translation methods to enhance retrieval systems in a RAG framework. The study fine-tunes state-of-the-art models for Sanskrit's linguistic nuances, evaluating models such as BM25, REPLUG, mDPR, ColBERT, Contriever, and GPT-2. It adapts summarization techniques for Sanskrit documents to improve QA processing. Evaluation shows DT methods outperform DR and QT in handling the cross-lingual challenges of ancient texts, improving accessibility and understanding. A dataset of 3,400 English-Sanskrit query-document pairs underpins the study, aiming to preserve Sanskrit scriptures and share their philosophical importance widely. Our dataset is publicly available at https://huggingface.co/datasets/manojbalaji1/anveshana

information retrieval, large language model, machine learning, (19 more...)

2505.19494

Country: Asia > India (0.28)

Genre:

Research Report > New Finding (0.93)
Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
(2 more...)