AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data

Zou, Wei, Yang, Sen, Bao, Yu, Huang, Shujian, Chen, Jiajun, Cheng, Shanbo

arXiv.org Artificial IntelligenceMay-20-2025

The rise of Large Language Models (LLMs) has reshaped machine translation (MT), but multilingual MT still relies heavily on parallel data for supervised fine-tuning (SFT), facing challenges like data scarcity for low-resource languages and catastrophic forgetting. To address these issues, we propose TRANS-ZERO, a self-play framework that leverages only monolingual data and the intrinsic multilingual knowledge of LLM. TRANS-ZERO combines Genetic Monte-Carlo Tree Search (G-MCTS) with preference optimization, achieving strong translation performance that rivals supervised methods. Experiments demonstrate that this approach not only matches the performance of models trained on large-scale parallel data but also excels in non-English translation directions. Further analysis reveals that G-MCTS itself significantly enhances translation quality by exploring semantically consistent candidates through iterative translations, providing a robust foundation for the framework's succuss.

large language model, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2504.14669

Country:

Asia (1.00)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models

Wu, Zhanglin, Song, Tengfei, Xie, Ning, Zhu, Mengli, Zhang, Weidong, Wu, Shuang, Li, Pengfei, Li, Chong, Zhu, Junhao, Yang, Hao, Sun, Shiliang

arXiv.org Artificial IntelligenceMay-20-2025

The rapid advancement of large vision-language models (LVLMs) has significantly propelled applications in document understanding, particularly in optical character recognition (OCR) and multilingual translation. However, current evaluations of LVLMs, like the widely used OCRBench, mainly focus on verifying the correctness of their short-text responses and long-text responses with simple layout, while the evaluation of their ability to understand long texts with complex layout design is highly significant but largely overlooked. In this paper, we propose Menu OCR and Translation Benchmark (MOTBench), a specialized evaluation framework emphasizing the pivotal role of menu translation in cross-cultural communication. MOTBench requires LVLMs to accurately recognize and translate each dish, along with its price and unit items on a menu, providing a comprehensive assessment of their visual understanding and language processing capabilities. Our benchmark is comprised of a collection of Chinese and English menus, characterized by intricate layouts, a variety of fonts, and culturally specific elements across different languages, along with precise human annotations. Experiments show that our automatic evaluation results are highly consistent with professional human evaluation. W e evaluate a range of publicly available state-of-the-art LVLMs, and through analyzing their output to identify the strengths and weaknesses in their performance, offering valuable insights to guide future advancements in LVLM development. MOTBench is available at https://github.com/gitwzl/MOTBench .

large language model, machine learning, translation, (22 more...)

arXiv.org Artificial Intelligence

2504.13945

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Towards Cultural Bridge by Bahnaric-Vietnamese Translation Using Transfer Learning of Sequence-To-Sequence Pre-training Language Model

Dat, Phan Tran Minh, Khang, Vo Hoang Nhat, Tho, Quan Thanh

arXiv.org Artificial IntelligenceMay-19-2025

This work explores the journey towards achieving Bahnaric-Vietnamese translation for the sake of culturally bridging the two ethnic groups in Vietnam. However, translating from Bahnaric to Vietnamese also encounters some difficulties. The most prominent challenge is the lack of available original Bahnaric resources source language, including vocabulary, grammar, dialogue patterns and bilingual corpus, which hinders the data collection process for training. To address this, we leverage a transfer learning approach using sequence-to-sequence pre-training language model. First of all, we leverage a pre-trained Vietnamese language model to capture the characteristics of this language. Especially, to further serve the purpose of machine translation, we aim for a sequence-to-sequence model, not encoder-only like BERT or decoder-only like GPT. Taking advantage of significant similarity between the two languages, we continue training the model with the currently limited bilingual resources of Vietnamese-Bahnaric text to perform the transfer learning from language model to machine translation. Thus, this approach can help to handle the problem of imbalanced resources between two languages, while also optimizing the training and computational processes. Additionally, we also enhanced the datasets using data augmentation to generate additional resources and defined some heuristic methods to help the translation more precise. Our approach has been validated to be highly effective for the Bahnaric-Vietnamese translation model, contributing to the expansion and preservation of languages, and facilitating better mutual understanding between the two ethnic people.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2505.11421

Country:

Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.06)
Asia > Vietnam > Bình Định Province (0.05)
Asia > Vietnam > Kon Tum Province > Kon Tum (0.05)
Asia > Vietnam > Gia Lai Province (0.05)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline

Madhavi, Hrishit, Cherian, Jacob, Khamkar, Yuvraj, Bhagat, Dhananjay

arXiv.org Artificial IntelligenceMay-19-2025

With the abundance of information in today's digital world, it is a major challenge to process voluminous text from news articles, reports, and web pages in an efficient manner. Text summarization solves this problem by providing brief, informative summaries of lengthy documents, both saving end-users time and mental effort [1]. Whereas traditional summarization methods involve only extractive approaches (identifying major sentences out of the source text) and abstractive approaches (producing new sentences capturing the core meaning), the current project outlines a holistic, multi-step NLP pipeline extending beyond mere summarization efforts [1]. The pipeline starts with Optical Character Recognition (OCR), which is achieved with Tesseract (Pytesseract). This module yields machine-readable text from images and handles various languages such as English, Hindi, Tamil, Urdu, Bengali, and Telugu [1]. The extracted information then passes through a chain of Natural Language Processing (NLP) and Machine Learning (ML) modules for more in-depth text analysis. The main elements of this pipeline are: The system combines state-of-the-art NLP features to boost text comprehension and processing.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2505.11177

Country: Asia > India > Maharashtra > Pune (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.73)
(2 more...)

Add feedback

Reconstructing Syllable Sequences in Abugida Scripts with Incomplete Inputs

Thu, Ye Kyaw, Oo, Thazin Myint

arXiv.org Artificial IntelligenceMay-19-2025

This paper explores syllable sequence prediction in Abugida languages using Transformer-based models, focusing on six languages: Bengali, Hindi, Khmer, Lao, Myanmar, and Thai, from the Asian Language Treebank (ALT) dataset. We investigate the reconstruction of complete syllable sequences from various incomplete input types, including consonant sequences, vowel sequences, partial syllables (with random character deletions), and masked syllables (with fixed syllable deletions). Our experiments reveal that consonant sequences play a critical role in accurate syllable prediction, achieving high BLEU scores, while vowel sequences present a significantly greater challenge. The model demonstrates robust performance across tasks, particularly in handling partial and masked syllable reconstruction, with strong results for tasks involving consonant information and syllable masking. This study advances the understanding of sequence prediction for Abugida languages and provides practical insights for applications such as text prediction, spelling correction, and data augmentation in these scripts.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.11008

Country:

Asia > Myanmar (0.29)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

Cai, Tianhao, Wang, Liang, Xiao, Limin, Han, Meng, Wang, Zeyu, Sun, Lin, Liao, Xiaojian

arXiv.org Artificial IntelligenceMay-15-2025

With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant DNNs on integrated NPUs. Specifically, a lightweight architecture is proposed to support model-exclusive, NPU-controlled regions inside shared cache to eliminate unexpected cache contention. Moreover, a cache scheduling method is proposed to improve shared cache utilization. In particular, it includes a cache-aware mapping method for adaptability to the varying available cache capacity and a dynamic allocation algorithm to adjust the usage among co-located DNNs at runtime. Compared to prior works, CaMDN reduces the memory access by 33.4% on average and achieves a model speedup of up to 2.56$\times$ (1.88$\times$ on average).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.06625

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Aya Vision: Advancing the Frontier of Multilingual Multimodality

Dash, Saurabh, Nan, Yiyang, Dang, John, Ahmadian, Arash, Singh, Shivalika, Smith, Madeline, Venkitesh, Bharat, Shmyhlo, Vlad, Aryabumi, Viraat, Beller-Morales, Walter, Pekmez, Jeremy, Ozuzu, Jason, Richemond, Pierre, Locatelli, Acyr, Frosst, Nick, Blunsom, Phil, Gomez, Aidan, Zhang, Ivan, Fadaee, Marzieh, Govindassamy, Manoj, Roy, Sudip, Gallé, Matthias, Ermis, Beyza, Üstün, Ahmet, Hooker, Sara

arXiv.org Artificial IntelligenceMay-14-2025

Building multimodal language models is fundamentally challenging: it requires aligning vision and language modalities, curating high-quality instruction data, and avoiding the degradation of existing text-only capabilities once vision is introduced. These difficulties are further magnified in the multilingual setting, where the need for multimodal data in different languages exacerbates existing data scarcity, machine translation often distorts meaning, and catastrophic forgetting is more pronounced. To address the aforementioned challenges, we introduce novel techniques spanning both data and modeling. First, we develop a synthetic annotation framework that curates high-quality, diverse multilingual multimodal instruction data, enabling Aya Vision models to produce natural, human-preferred responses to multimodal inputs across many languages. Complementing this, we propose a cross-modal model merging technique that mitigates catastrophic forgetting, effectively preserving text-only capabilities while simultaneously enhancing multimodal generative performance. Aya-Vision-8B achieves best-in-class performance compared to strong multimodal models such as Qwen-2.5-VL-7B, Pixtral-12B, and even much larger Llama-3.2-90B-Vision. We further scale this approach with Aya-Vision-32B, which outperforms models more than twice its size, such as Molmo-72B and LLaMA-3.2-90B-Vision. Our work advances multilingual progress on the multi-modal frontier, and provides insights into techniques that effectively bend the need for compute while delivering extremely high performance.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.08751

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Spain (0.14)
Asia > India (0.04)
(33 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition

Kiruluta, Andrew, Lundy, Eric, Burity, Priscilla

arXiv.org Artificial IntelligenceMay-14-2025

We introduce the Graph W avelet Transformer (GWT), a novel architecture that replaces this bottleneck with a learnable, multi-scale wavelet transform defined over an explicit graph Laplacian derived from syntactic or semantic parses. By parameterizing K N bandpass filters in the graph Fourier domain, GWT achieves a linear-time mixing operator that simultaneously captures local syntactic dependencies and global semantic context. We provide a rigorous mathematical formulation of the spectral filtering and mixing process, integrate GWT modules into a standard Graph Transformer backbone, and evaluate on the WMT14 English-German translation benchmark. Empirical results demonstrate that GWT outperforms the baseline Graph Transformer by 0.8 BLEU, reduces parameter count by 7 %, and speeds up inference by 15 %. Our analysis shows that multi-scale spectral decomposition offers an interpretable, efficient, and expressive alternative to quadratic self-attention for graph-structured sequence modeling.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.07862

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation

Manna, Chiara, Alishahi, Afra, Blain, Frédéric, Vanmassenhove, Eva

arXiv.org Artificial IntelligenceMay-14-2025

While gender bias in modern Neural Machine Translation (NMT) systems has received much attention, traditional evaluation metrics do not to fully capture the extent to which these systems integrate contextual gender cues. We propose a novel evaluation metric called Minimal Pair Accuracy (MPA), which measures the reliance of models on gender cues for gender disambiguation. MPA is designed to go beyond surface-level gender accuracy metrics by focusing on whether models adapt to gender cues in minimal pairs -- sentence pairs that differ solely in the gendered pronoun, namely the explicit indicator of the target's entity gender in the source language (EN). We evaluate a number of NMT models on the English-Italian (EN--IT) language pair using this metric, we show that they ignore available gender cues in most cases in favor of (statistical) stereotypical gender interpretation. We further show that in anti-stereotypical cases, these models tend to more consistently take masculine gender cues into account while ignoring the feminine cues. Furthermore, we analyze the attention head weights in the encoder component and show that while all models encode gender information to some extent, masculine cues elicit a more diffused response compared to the more concentrated and specialized responses to feminine gender cues.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.08546

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Translating the Grievance Dictionary: a psychometric evaluation of Dutch, German, and Italian versions

van der Vegt, Isabelle, Kleinberg, Bennett, Miotto, Marilu, Festor, Jonas

arXiv.org Artificial IntelligenceMay-13-2025

This paper introduces and evaluates three translations of the Grievance Dictionary, a psycholinguistic dictionary for the analysis of violent, threatening or grievance-fuelled texts. Considering the relevance of these themes in languages beyond English, we translated the Grievance Dictionary to Dutch, German, and Italian. We describe the process of automated translation supplemented by human annotation. Psychometric analyses are performed, including internal reliability of dictionary categories and correlations with the LIWC dictionary. The Dutch and German translations perform similarly to the original English version, whereas the Italian dictionary shows low reliability for some categories. Finally, we make suggestions for further validation and application of the dictionary, as well as for future dictionary translations following a similar approach.

artificial intelligence, dictio, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.07495

Genre: Research Report (0.64)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.35)

Add feedback