original language
AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection
Karkani, Dimitra, Lymperaiou, Maria, Filandrianos, Giorgos, Spanos, Nikolaos, Voulodimos, Athanasios, Stamou, Giorgos
Multilingual hallucination detection stands as an underexplored challenge, which the Mu-SHROOM shared task seeks to address. In this work, we propose an efficient, training-free LLM prompting strategy that enhances detection by translating multilingual text spans into English. Our approach achieves competitive rankings across multiple languages, securing two first positions in low-resource languages. The consistency of our results highlights the effectiveness of our translation strategy for hallucination detection, demonstrating its applicability regardless of the source language.
MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing
Zhou, Hao, Wang, Zhijun, Huang, Shujian, Huang, Xin, Han, Xue, Feng, Junlan, Deng, Chao, Luo, Weihua, Chen, Jiajun
Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of the ability of original languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, indicating the challenge of balancing language expansion while preventing forgetting. In this paper, we propose a method called MoE-LPR (Mixture-of-Experts with Language Priors Routing) to alleviate this problem. MoE-LPR employs a two-stage training approach to enhance the multilingual capability. First, the model is post-pretrained into a Mixture-of-Experts (MoE) architecture by upcycling, where all the original parameters are frozen and new experts are added. In this stage, we focus improving the ability on expanded languages, without using any original language data. Then, the model reviews the knowledge of the original languages with replay data amounting to less than 1% of post-pretraining, where we incorporate language priors routing to better recover the abilities of the original languages. Evaluations on multiple benchmarks show that MoE-LPR outperforms other post-pretraining methods. Freezing original parameters preserves original language knowledge while adding new experts preserves the learning ability. Reviewing with LPR enables effective utilization of multilingual knowledge within the parameters. Additionally, the MoE architecture maintains the same inference overhead while increasing total model parameters. Extensive experiments demonstrate MoE-LPR's effectiveness in improving expanded languages and preserving original language proficiency with superior scalability. Code and scripts are freely available at https://github.com/zjwang21/MoE-LPR.git.
Extending Multilingual Machine Translation through Imitation Learning
Lai, Wen, Hangya, Viktor, Fraser, Alexander
Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind. We aim to extend large-scale MNMT models to a new language, allowing for translation between the newly added and all of the already supported languages in a challenging scenario: using only a parallel corpus between the new language and English. Previous approaches, such as continued training on parallel data including the new language, suffer from catastrophic forgetting (i.e., performance on other languages is reduced). Our novel approach Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert, a technique widely used in the computer vision area, but not well explored in NLP. More specifically, we construct a pseudo multi-parallel corpus of the new and the original languages by pivoting through English, and imitate the output distribution of the original MNMT model. Extensive experiments show that our approach significantly improves the translation performance between the new and the original languages, without severe catastrophic forgetting. We also demonstrate that our approach is capable of solving copy and off-target problems, which are two common issues existence in current large-scale MNMT models.
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models
A few benchmarking datasets have been released to evaluate the factual knowledge of pretrained language models. These benchmarks (e.g., LAMA, and ParaRel) are mainly developed in English and later are translated to form new multilingual versions (e.g., mLAMA, and mParaRel). Results on these multilingual benchmarks suggest that using English prompts to recall the facts from multilingual models usually yields significantly better and more consistent performance than using non-English prompts. Our analysis shows that mLAMA is biased toward facts from Western countries, which might affect the fairness of probing models. We propose a new framework for curating factual triples from Wikidata that are culturally diverse. A new benchmark DLAMA-v1 is built of factual triples from three pairs of contrasting cultures having a total of 78,259 triples from 20 relation predicates. The three pairs comprise facts representing the (Arab and Western), (Asian and Western), and (South American and Western) countries respectively. Having a more balanced benchmark (DLAMA-v1) supports that mBERT performs better on Western facts than non-Western ones, while monolingual Arabic, English, and Korean models tend to perform better on their culturally proximate facts. Moreover, both monolingual and multilingual models tend to make a prediction that is culturally or geographically relevant to the correct label, even if the prediction is wrong.
Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation
Bogoychev, Nikolay, Sennrich, Rico
The quality of neural machine translation can be improved by leveraging additional monolingual resources to create synthetic training data. Source-side monolingual data can be (forward-)translated into the target language for self-training; target-side monolingual data can be back-translated. It has been widely reported that back-translation delivers superior results, but could this be due to artefacts in the test sets? W e perform a case study using French-English news translation task and separate test sets based on their original languages. W e show that forward translation delivers superior gains in terms of BLEU on sentences that were originally in the source language, complementing previous studies which show large improvements with back-translation on sentences that were originally in the target language. To better understand when and why forward and back-translation are effective, we study the role of domains, translationese, and noise. While translationese effects are well known to influence MT evaluation, we also find evidence that news data from different languages shows subtle domain differences, which is another explanation for varying performance on different portions of the test set. W e perform additional low-resource experiments which demonstrate that forward translation is more sensitive to the quality of the initial translation system than back-translation, and tends to perform worse in low-resource settings.
Using AI And ML For Translation Solutions - DZone AI
Natural Language Processing; it's Artificial Intelligence that learns words and patterns of words so that it can respond to human searches and questions. Siri and Alexa are examples of this technology. And this technology is continually improving. As more and more conversations are held with these machines, they continue to learn and respond more accurately. Machines are also in use for translations.
Q&A: Douglas Hofstadter on why AI is far from intelligent
The field of artificial intelligence may finally be coming back around to Douglas Hofstadter. Since winning a Pulitzer Prize in nonfiction for his 1979 book Gödel, Escher, Bach: an Eternal Golden Braid, Hofstadter, 72, has been quietly thinking about thinking, and how we might get computers to do it. In the early days of AI research in 1950s and 60s, the goal was to create computers that think and learn the way humans do, by remodeling our ability to intuitively understand the world around us. But thinking turned out to be more complicated than something that could fit in a 1950s computer program. What did eventually yield results, though, was giving up on thinking altogether, focusing computers instead on highly specific tasks and giving them vast amounts of relevant data--resulting in the AI boom we see today. A computer can beat a human at chess not by searching for the satisfaction of making an elegant move, but sifting through millions of previously played games to see which move is more likely to lead to victory.