AITopics | mnmt model

Collaborating Authors

mnmt model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation

Qu, Zhi, Wang, Yiran, Mao, Jiannan, Ding, Chenchen, Tanaka, Hideki, Utiyama, Masao, Watanabe, Taro

arXiv.org Artificial IntelligenceJan-6-2025

The multilingual neural machine translation (MNMT) enables arbitrary translations across multiple languages by training a model with limited parameters using parallel data only. However, the performance of such MNMT models still lags behind that of large language models (LLMs), limiting their practicality. In this work, we address this limitation by introducing registering to achieve the new state-of-the-art of decoder-only MNMT models. Specifically, we insert a set of artificial tokens specifying the target language, called registers, into the input sequence between the source and target tokens. By modifying the attention mask, the target token generation only pays attention to the activation of registers, representing the source tokens in the target language space. Experiments on EC-40, a large-scale benchmark, show that our method outperforms related methods driven by optimizing multilingual representations. We further scale up and collect 9.3 billion sentence pairs across 24 languages from public datasets to pre-train two models, namely MITRE (multilingual translation with registers). One of them, MITRE-913M, outperforms NLLB-3.3B, achieves comparable performance with commercial LLMs, and shows strong adaptability in fine-tuning. Finally, we open-source our models to facilitate further research and development in MNMT: https://github.com/zhiqu22/mitre.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2501.02979

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

On the Shortcut Learning in Multilingual Neural Machine Translation

Wang, Wenxuan, Jiao, Wenxiang, Huang, Jen-tse, Tu, Zhaopeng, Lyu, Michael R.

arXiv.org Artificial IntelligenceNov-15-2024

In this study, we revisit the commonly-cited off-target issue in multilingual neural machine translation (MNMT). By carefully designing experiments on different MNMT scenarios and models, we attribute the off-target issue to the overfitting of the shortcuts of (non-centric, centric) language mappings. Specifically, the learned shortcuts biases MNMT to mistakenly translate non-centric languages into the centric language instead of the expected non-centric language for zero-shot translation. Analyses on learning dynamics show that the shortcut learning generally occurs in the later stage of model training, and multilingual pretraining accelerates and aggravates the shortcut learning. Based on these observations, we propose a simple and effective training strategy to eliminate the shortcuts in MNMT models by leveraging the forgetting nature of model training. The only difference from the standard training is that we remove the training instances that may induce the shortcut learning in the later stage of model training. Without introducing any additional data and computational costs, our approach can consistently and significantly improve the zero-shot translation performance by alleviating the shortcut learning for different MNMT models and benchmarks.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2411.10581

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Wang, Wenxuan, Jiao, Wenxiang, Wang, Shuo, Tu, Zhaopeng, Lyu, Michael R.

arXiv.org Artificial IntelligenceOct-18-2024

Zero-shot translation is a promising direction for building a comprehensive multilingual neural machine translation~(MNMT) system. However, its quality is still not satisfactory due to off-target issues. In this paper, we aim to understand and alleviate the off-target issues from the perspective of uncertainty in zero-shot translation. By carefully examining the translation output and model confidence, we identify two uncertainties that are responsible for the off-target issues, namely, extrinsic data uncertainty and intrinsic model uncertainty. Based on the observations, we propose two lightweight and complementary approaches to denoise the training data for model training and explicitly penalize the off-target translations by unlikelihood training during model training. Extensive experiments on both balanced and imbalanced datasets show that our approaches significantly improve the performance of zero-shot translation over strong MNMT baselines.

large language model, natural language, translation, (20 more...)

arXiv.org Artificial Intelligence

2205.10068

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Extending Multilingual Machine Translation through Imitation Learning

Lai, Wen, Hangya, Viktor, Fraser, Alexander

arXiv.org Artificial IntelligenceNov-14-2023

Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind. We aim to extend large-scale MNMT models to a new language, allowing for translation between the newly added and all of the already supported languages in a challenging scenario: using only a parallel corpus between the new language and English. Previous approaches, such as continued training on parallel data including the new language, suffer from catastrophic forgetting (i.e., performance on other languages is reduced). Our novel approach Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert, a technique widely used in the computer vision area, but not well explored in NLP. More specifically, we construct a pseudo multi-parallel corpus of the new and the original languages by pivoting through English, and imitate the output distribution of the original MNMT model. Extensive experiments show that our approach significantly improves the translation performance between the new and the original languages, without severe catastrophic forgetting. We also demonstrate that our approach is capable of solving copy and off-target problems, which are two common issues existence in current large-scale MNMT models.

new language, on-the-fly, translation, (14 more...)

arXiv.org Artificial Intelligence

2311.08538

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(7 more...)

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

Zan, Changtong, Ding, Liang, Shen, Li, Lei, Yibin, Zhan, Yibing, Liu, Weifeng, Tao, Dacheng

arXiv.org Artificial IntelligenceSep-28-2023

Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data. The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs, e.g., for English and for German. Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem (non-target language words exist in the generated translation) and, therefore, difficult to apply the current multilingual translation model to a broad range of zero-shot language scenarios. To understand when and why the navigation capabilities of language IDs are weakened, we compare two extreme decoder input cases in the ZST directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively visualizing the contextual word representations (CWRs) of these cases with teacher forcing, we show that 1) the CWRs of different languages are effectively distributed in separate regions when the sentence and ID are matched (ON setting), and 2) if the sentence and ID are unmatched (OFF setting), the CWRs of different languages are chaotically distributed. Our analyses suggest that although they work well in ideal ON settings, language IDs become fragile and lose their navigation ability when faced with off-target tokens, which commonly exist during inference but are rare in training scenarios. In response, we employ unlikelihood tuning on the negative (OFF) samples to minimize their probability such that the language IDs can discriminate between the on- and off-target tokens during training. Experiments spanning 40 ZST directions show that our method reduces the off-target ratio by -48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3% tuning cost.

aclanthology, language id, translation, (16 more...)

arXiv.org Artificial Intelligence

2309.16599

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Multilingual Neural Machine Translation System for Indic to Indic Languages

Das, Sudhansu Bala, Panda, Divyajyoti, Mishra, Tapas Kumar, Patra, Bidyut Kr., Ekbal, Asif

arXiv.org Artificial IntelligenceJun-22-2023

This paper gives an Indic-to-Indic (IL-IL) MNMT baseline model for 11 ILs implemented on the Samanantar corpus and analyzed on the Flores-200 corpus. All the models are evaluated using the BLEU score. In addition, the languages are classified under three groups namely East Indo- Aryan (EI), Dravidian (DR), and West Indo-Aryan (WI). The effect of language relatedness on MNMT model efficiency is studied. Owing to the presence of large corpora from English (EN) to ILs, MNMT IL-IL models using EN as a pivot are also built and examined. To achieve this, English- Indic (EN-IL) models are also developed, with and without the usage of related languages. Results reveal that using related languages is beneficial for the WI group only, while it is detrimental for the EI group and shows an inconclusive effect on the DR group, but it is useful for EN-IL models. Thus, related language groups are used to develop pivot MNMT models. Furthermore, the IL corpora are transliterated from the corresponding scripts to a modified ITRANS script, and the best MNMT models from the previous approaches are built on the transliterated corpus. It is observed that the usage of pivot models greatly improves MNMT baselines with AS-TA achieving the minimum BLEU score and PA-HI achieving the maximum score. Among languages, AS, ML, and TA achieve the lowest BLEU score, whereas HI, PA, and GU perform the best. Transliteration also helps the models with few exceptions. The best increment of scores is observed in ML, TA, and BN and the worst average increment is observed in KN, HI, and PA, across all languages. The best model obtained is the PA-HI language pair trained on PAWI transliterated corpus which gives 24.29 BLEU.

artificial intelligence, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2306.12693

Country:

Asia > East Asia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(12 more...)

Genre:

Workflow (0.46)
Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation

Zhao, Yang, Zhu, Junnan, Xiang, Lu, Zhang, Jiajun, Zhou, Yu, Zhai, Feifei, Zong, Chengqing

arXiv.org Artificial IntelligenceDec-6-2022

A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily from catastrophic forgetting (CF). To alleviate the CF, we investigate knowledge distillation based life-long learning methods. Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. In many-to one scenario, we find that direct distillation faces the extreme partial distillation problem, and we propose two different methods to address it: pseudo input distillation and reverse teacher distillation. The experimental results on twelve translation tasks show that the proposed methods can better consolidate the previous knowledge and sharply alleviate the CF.

artificial intelligence, distillation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.028

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > Italy (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (0.65)

Industry: Education > Educational Setting > Continuing Education (0.85)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Impact of Domain-Adapted Multilingual Neural Machine Translation in the Medical Domain

Rios, Miguel, Chereji, Raluca-Maria, Secara, Alina, Ciobanu, Dragos

arXiv.org Artificial IntelligenceDec-5-2022

Multilingual Neural Machine Translation (MNMT) models leverage many language pairs during training to improve translation quality for low-resource languages by transferring knowledge from high-resource languages. We study the quality of a domain-adapted MNMT model in the medical domain for English-Romanian with automatic metrics and a human error typology annotation which includes terminology-specific error categories. We compare the out-of-domain MNMT with the in-domain adapted MNMT. The in-domain MNMT model outperforms the out-of-domain MNMT in all measured automatic metrics and produces fewer terminology errors.

artificial intelligence, machine translation, natural language, (15 more...)

arXiv.org Artificial Intelligence

2212.02143

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(9 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Health Care Providers & Services (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Robust Domain Adaptation for Pre-trained Multilingual Neural Machine Translation Models

Grosso, Mathieu, Ratnamogan, Pirashanth, Mathey, Alexis, Vanhuffel, William, Fotso, Michael Fotso

arXiv.org Artificial IntelligenceOct-26-2022

Recent literature has demonstrated the potential of multilingual Neural Machine Translation (mNMT) models. However, the most efficient models are not well suited to specialized industries. In these cases, internal data is scarce and expensive to find in all language pairs. Therefore, fine-tuning a mNMT model on a specialized domain is hard. In this context, we decided to focus on a new task: Domain Adaptation of a pre-trained mNMT model on a single pair of language while trying to maintain model quality on generic domain data for all language pairs. The risk of loss on generic domain and on other pairs is high. This task is key for mNMT model adoption in the industry and is at the border of many others. We propose a fine-tuning procedure for the generic mNMT that combines embeddings freezing and adversarial loss. Our experiments demonstrated that the procedure improves performances on specialized data with a minimal loss in initial performances on generic domain for all languages pairs, compared to a naive standard approach (+10.0 BLEU score on specialized data, -0.01 to -0.5 BLEU on WMT and Tatoeba datasets on the other pairs with M2M100).

machine learning, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2210.14979

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

$m^4Adapter$: Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter

Lai, Wen, Chronopoulou, Alexandra, Fraser, Alexander

arXiv.org Artificial IntelligenceOct-21-2022

Multilingual neural machine translation models (MNMT) yield state-of-the-art performance when evaluated on data from a domain and language pair seen at training time. However, when a MNMT model is used to translate under domain shift or to a new language pair, performance drops dramatically. We consider a very challenging scenario: adapting the MNMT model both to a new domain and to a new language pair at the same time. In this paper, we propose $m^4Adapter$ (Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter), which combines domain and language knowledge using meta-learning with adapters. We present results showing that our approach is a parameter-efficient solution which effectively adapts a model to both a new language pair and a new domain, while outperforming other adapter methods. An ablation study also shows that our approach more effectively transfers domain knowledge across different languages and language information across different domains.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2210.11912

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(14 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback