Machine Translation
The Challenge of Achieving Attributability in Multilingual Table-to-Text Generation with Question-Answer Blueprints
Multilingual Natural Language Generation (NLG) is challenging due to the lack of training data for low-resource languages. However, some low-resource languages have up to tens of millions of speakers globally, making it important to improve NLG tools for them. Table-to-Text NLG is an excellent measure of models' reasoning abilities but is very challenging in the multilingual setting. System outputs are often not attributable, or faithful, to the data in the source table. Intermediate planning techniques like Question-Answer (QA) blueprints have been shown to improve attributability on summarisation tasks. This work explores whether QA blueprints make multilingual Table-to-Text outputs more attributable to the input tables. This paper extends the challenging multilingual Table-to-Text dataset, TaTA, which includes African languages, with QA blueprints. Sequence-to-sequence language models are then finetuned on this dataset, with and without blueprints. Results show that QA blueprints improve performance for models finetuned and evaluated only on English examples, but do not demonstrate gains in the multilingual setting. This is due to inaccuracies in machine translating the blueprints from English into target languages when generating the training data, and models failing to rely closely on the blueprints they generate. An in-depth analysis is conducted on why this is challenging.
Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation
Thillainathan, Sarubi, Yuan, Songchen, Lee, En-Shiun Annie, Jayasena, Sanath, Ranathunga, Surangika
Fine-tuning multilingual sequence-to-sequence large language models (msLLMs) has shown promise in developing neural machine translation (NMT) systems for low-resource languages (LRLs). However, conventional single-stage fine-tuning methods struggle in extremely low-resource NMT settings, where training data is very limited. This paper contributes to artificial intelligence by proposing two approaches for adapting msLLMs in these challenging scenarios: (1) continual pre-training (CPT), where the msLLM is further trained with domain-specific monolingual data to compensate for the under-representation of LRLs, and (2) intermediate task transfer learning (ITTL), a method that fine-tunes the msLLM with both in-domain and out-of-domain parallel data to enhance its translation capabilities across various domains and tasks. As an application in engineering, these methods are implemented in NMT systems for Sinhala, Tamil, and English (six language pairs) in domain-specific, extremely low-resource settings (datasets containing fewer than 100,000 samples). Our experiments reveal that these approaches enhance translation performance by an average of +1.47 bilingual evaluation understudy (BLEU) score compared to the standard single-stage fine-tuning baseline across all translation directions. Additionally, a multi-model ensemble further improves performance by an additional BLEU score.
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Karamolegkou, Antonia, Nikandrou, Malvina, Pantazopoulos, Georgios, Villegas, Danae Sanchez, Rust, Phillip, Dhar, Ruchira, Hershcovich, Daniel, Søgaard, Anders
This paper explores the effectiveness of Multimodal Large Language models (MLLMs) as assistive technologies for visually impaired individuals. We conduct a user survey to identify adoption patterns and key challenges users face with such technologies. Despite a high adoption rate of these models, our findings highlight concerns related to contextual understanding, cultural sensitivity, and complex scene understanding, particularly for individuals who may rely solely on them for visual interpretation. Informed by these results, we collate five user-centred tasks with image and video inputs, including a novel task on Optical Braille Recognition. Our systematic evaluation of twelve MLLMs reveals that further advancements are necessary to overcome limitations related to cultural context, multilingual support, Braille reading comprehension, assistive object recognition, and hallucinations. This work provides critical insights into the future direction of multimodal AI for accessibility, underscoring the need for more inclusive, robust, and trustworthy visual assistance technologies.
Non-Monotonic Attention-based Read/Write Policy Learning for Simultaneous Translation
Ahmed, Zeeshan, Seide, Frank, Liu, Zhe, Rabatin, Rastislav, Kolar, Jachym, Moritz, Niko, Xie, Ruiming, Merello, Simone, Fuegen, Christian
Simultaneous or streaming machine translation generates translation while reading the input stream. These systems face a quality/latency trade-off, aiming to achieve high translation quality similar to non-streaming models with minimal latency. We propose an approach that efficiently manages this trade-off. By enhancing a pretrained non-streaming model, which was trained with a seq2seq mechanism and represents the upper bound in quality, we convert it into a streaming model by utilizing the alignment between source and target tokens. This alignment is used to learn a read/write decision boundary for reliable translation generation with minimal input. During training, the model learns the decision boundary through a read/write policy module, employing supervised learning on the alignment points (pseudo labels). The read/write policy module, a small binary classification unit, can control the quality/latency trade-off during inference. Experimental results show that our model outperforms several strong baselines and narrows the gap with the non-streaming baseline model.
Training in translation tools and technologies: Findings of the EMT survey 2023
Rothwell, Andrew, Moorkens, Joss, Svoboda, Tomas
This article reports on the third iteration of a survey of computerized tools and technologies taught as part of postgraduate translation training programmes. While the survey was carried out under the aegis of the EMT Network, more than half of responses are from outside that network. The results show the responsiveness of programmes to innovations in translation technology, with increased compulsory inclusion of machine translation, post-editing, and quality evaluation, and a rapid response to the release of generative tools. The flexibility required during the Covid-19 pandemic has also led to some lasting changes to programmes. While the range of tools being taught has continued to expand, programmes seem to be consolidating their core offering around cloud-based software with cost-free academic access. There has also been an increase in the embedding of professional contexts and workflows associated with translation technology. Generic file management and data security skills have increased in perceived importance, and legal and ethical issues related to translation data have also become more prominent. In terms of course delivery the shift away from conventional labs identified in EMT2017 has accelerated markedly, no doubt partly driven by the pandemic, accompanied by a dramatic expansion in the use of students' personal devices.
Sociotechnical Effects of Machine Translation
Moorkens, Joss, Way, Andy, Lankford, Séamus
While the previous chapters have shown how machine translation (MT) can be useful, in this chapter we discuss some of the side-effects and risks that are associated, and how they might be mitigated. With the move to neural MT and approaches using Large Language Models (LLMs), there is an associated impact on climate change, as the models built by multinational corporations are massive. They are hugely expensive to train, consume large amounts of electricity, and output huge volumes of kgCO2 to boot. However, smaller models which still perform to a high level of quality can be built with much lower carbon footprints, and tuning pre-trained models saves on the requirement to train from scratch. We also discuss the possible detrimental effects of MT on translators and other users. The topics of copyright and ownership of data are discussed, as well as ethical considerations on data and MT use. Finally, we show how if done properly, using MT in crisis scenarios can save lives, and we provide a method of how this might be done.
Low-resource Information Extraction with the European Clinical Case Corpus
Ghosh, Soumitra, Altuna, Begona, Farzi, Saeed, Ferrazzi, Pietro, Lavelli, Alberto, Mezzanotte, Giulia, Speranza, Manuela, Magnini, Bernardo
We present E3C-3.0, a multilingual dataset in the medical domain, comprising clinical cases annotated with diseases and test-result relations. The dataset includes both native texts in five languages (English, French, Italian, Spanish and Basque) and texts translated and projected from the English source into five target languages (Greek, Italian, Polish, Slovak, and Slovenian). A semi-automatic approach has been implemented, including automatic annotation projection based on Large Language Models (LLMs) and human revision. We present several experiments showing that current state-of-the-art LLMs can benefit from being fine-tuned on the E3C-3.0 dataset. We also show that transfer learning in different languages is very effective, mitigating the scarcity of data. Finally, we compare performance both on native data and on projected data. We release the data at https://huggingface.co/collections/NLP-FBK/e3c-projected-676a7d6221608d60e4e9fd89 .
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Deviyani, Athiya, Diaz, Fernando
Meta-evaluation of automatic evaluation metrics -- assessing evaluation metrics themselves -- is crucial for accurately benchmarking natural language processing systems and has implications for scientific inquiry, production model development, and policy enforcement. While existing approaches to metric meta-evaluation focus on general statements about the absolute and relative quality of metrics across arbitrary system outputs, in practice, metrics are applied in highly contextual settings, often measuring the performance for a highly constrained set of system outputs. For example, we may only be interested in evaluating a specific model or class of models. We introduce a method for contextual metric meta-evaluation by comparing the local metric accuracy of evaluation metrics. Across translation, speech recognition, and ranking tasks, we demonstrate that the local metric accuracies vary both in absolute value and relative effectiveness as we shift across evaluation contexts. This observed variation highlights the importance of adopting context-specific metric evaluations over global ones.
Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair
Borisov, Maksim, Kozhirbayev, Zhanibek, Malykh, Valentin
Machine translation for low resource language pairs is a challenging task. This task could become extremely difficult once a speaker uses code switching. We propose a method to build a machine translation model for code-switched Kazakh-Russian language pair with no labeled data. Our method is basing on generation of synthetic data. Additionally, we present the first codeswitching Kazakh-Russian parallel corpus and the evaluation results, which include a model achieving 16.48 BLEU almost reaching an existing commercial system and beating it by human evaluation.
HausaNLP at SemEval-2025 Task 2: Entity-Aware Fine-tuning vs. Prompt Engineering in Entity-Aware Machine Translation
Abubakar, Abdulhamid, Abdulkadir, Hamidatu, Abdullahi, Ibrahim Rabiu, Khalid, Abubakar Auwal, Wali, Ahmad Mustapha, Umar, Amina Aminu, Bala, Maryam, Sani, Sani Abdullahi, Ahmad, Ibrahim Said, Muhammad, Shamsuddeen Hassan, Abdulmumin, Idris, Marivate, Vukosi
This paper presents our findings for SemEval 2025 Task 2, a shared task on entity-aware machine translation (EA-MT). The goal of this task is to develop translation models that can accurately translate English sentences into target languages, with a particular focus on handling named entities, which often pose challenges for MT systems. The task covers 10 target languages with English as the source. In this paper, we describe the different systems we employed, detail our results, and discuss insights gained from our experiments.