Goto

Collaborating Authors

 Machine Translation


Optimizing transformer-based machine translation model for single GPU training: a hyperparameter ablation study

arXiv.org Artificial Intelligence

In machine translation tasks, the relationship between model complexity and performance is often presumed to be linear, driving an increase in the number of parameters and consequent demands for computational resources like multiple GPUs. To explore this assumption, this study systematically investigates the effects of hyperparameters through ablation on a sequence-to-sequence machine translation pipeline, utilizing a single NVIDIA A100 GPU. Contrary to expectations, our experiments reveal that combinations with the most parameters were not necessarily the most effective. This unexpected insight prompted a careful reduction in parameter sizes, uncovering "sweet spots" that enable training sophisticated models on a single GPU without compromising translation quality. The findings demonstrate an intricate relationship between hyperparameter selection, model size, and computational resource needs. The insights from this study contribute to the ongoing efforts to make machine translation more accessible and cost-effective, emphasizing the importance of precise hyperparameter tuning over mere scaling.


Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations

arXiv.org Artificial Intelligence

Language models that are sensitive to external context can more effectively capture the speaking patterns of individuals with specific characteristics or in particular environments. However, obtaining and leveraging such annotations can be challenging. In this work, we show how to leverage rich character and film annotations to personalise language models in a scalable manner. Our best model can reduce perplexity by up to 6.5% compared to a parameter-matched language model. Our approach performs on par with speaker-specific fine-tuning when the fine-tuning data (i.e. past dialogue) for individual speakers is available. On top of that, it also generalises well to a scenario with no such data, relying on combinations of demographic characteristics expressed via metadata. Our findings are consistent across two corpora, one of which is also a contribution of this paper: Cornell-rich contains rich manual annotations for 863 speaking characters from the Cornell Movie Dialog Corpus, including features such as characteristic quotes and character descriptions, along with six automatically extracted metadata features for over 95% of the featured films. Finally, we also present a cost-benefit analysis highlighting which annotations are most cost-effective in reducing perplexity.


Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

arXiv.org Artificial Intelligence

Current research in zero-shot translation is plagued by several issues such as high compute requirements, increased training time and off target translations. Proposed remedies often come at the cost of additional data or compute requirements. Pivot based neural machine translation is preferred over a single-encoder model for most settings despite the increased training and evaluation time. In this work, we overcome the shortcomings of zero-shot translation by taking advantage of transliteration and linguistic similarity. We build a single encoder-decoder neural machine translation system for Dravidian-Dravidian multilingual translation and perform zero-shot translation. We compare the data vs zero-shot accuracy tradeoff and evaluate the performance of our vanilla method against the current state of the art pivot based method. We also test the theory that morphologically rich languages require large vocabularies by restricting the vocabulary using an optimal transport based technique. Our model manages to achieves scores within 3 BLEU of large-scale pivot-based models when it is trained on 50\% of the language directions.


Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation

arXiv.org Artificial Intelligence

N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks. However, recent studies have revealed a weak correlation between these matching-based metrics and human evaluations, especially when compared with neural-based metrics like BLEURT. In this paper, we conjecture that the performance bottleneck in matching-based metrics may be caused by the limited diversity of references. To address this issue, we propose to utilize \textit{multiple references} to enhance the consistency between these metrics and human evaluations. Within the WMT Metrics benchmarks, we observe that the multi-references F200spBLEU surpasses the conventional single-reference one by an accuracy improvement of 7.2\%. Remarkably, it also exceeds the neural-based BERTscore by an accuracy enhancement of 3.9\%. Moreover, we observe that the data leakage issue in large language models (LLMs) can be mitigated to a large extent by our multi-reference metric. We release the code and data at \url{https://github.com/SefaZeng/LLM-Ref}


Gmail language translation finally appears on mobile

PCWorld

We've all taken automatic language translation for granted, especially where web pages are concerned. But Google has finally added it to where it might be the most useful: the mobile version of Gmail. To be fair, automatic translation has been a feature of Gmail for years, but only on the web. Now, Google is building it into its mobile app, where millions of people access their email every day. Android users will see automatic translation within their Gmail app via an update scheduled beginning today, August 8; iOS users will receive an update as early as August 21.


Character-level NMT and language similarity

arXiv.org Artificial Intelligence

We explore the effectiveness of character-level neural machine translation using Transformer architecture for various levels of language similarity and size of the training dataset on translation between Czech and Croatian, German, Hungarian, Slovak, and Spanish. We evaluate the models using automatic MT metrics and show that translation between similar languages benefits from character-level input segmentation, while for less related languages, character-level vanilla Transformer-base often lags behind subword-level segmentation. We confirm previous findings that it is possible to close the gap by finetuning the already trained subword-level models to character-level.


Negative Lexical Constraints in Neural Machine Translation

arXiv.org Artificial Intelligence

This paper explores negative lexical constraining in English to Czech neural machine translation. Negative lexical constraining is used to prohibit certain words or expressions in the translation produced by the neural translation model. We compared various methods based on modifying either the decoding process or the training data. The comparison was performed on two tasks: paraphrasing and feedback-based translation refinement. We also studied to which extent these methods "evade" the constraints presented to the model (usually in the dictionary form) by generating a different surface form of a given constraint.We propose a way to mitigate the issue through training with stemmed negative constraints to counter the model's ability to induce a variety of the surface forms of a word that can result in bypassing the constraint. We demonstrate that our method improves the constraining, although the problem still persists in many cases.


QAmeleon: Multilingual QA with Only 5 Examples

arXiv.org Artificial Intelligence

The availability of large, high-quality datasets has been one of the main drivers of recent progress in question answering (QA). Such annotated datasets however are difficult and costly to collect, and rarely exist in languages other than English, rendering QA technology inaccessible to underrepresented languages. An alternative to building large monolingual training datasets is to leverage pre-trained language models (PLMs) under a few-shot learning setting. Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are trained, thus avoiding costly annotation. Prompt tuning the PLM for data synthesis with only five examples per language delivers accuracy superior to translation-based baselines, bridges nearly 60% of the gap between an English-only baseline and a fully supervised upper bound trained on almost 50,000 hand labeled examples, and always leads to substantial improvements compared to fine-tuning a QA model directly on labeled examples in low resource settings. Experiments on the TyDiQA-GoldP and MLQA benchmarks show that few-shot prompt tuning for data synthesis scales across languages and is a viable alternative to large-scale annotation.


Towards Scene-Text to Scene-Text Translation

arXiv.org Artificial Intelligence

In this work, we study the task of ``visually" translating scene text from a source language (e.g., English) to a target language (e.g., Chinese). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the text, such as font, size, and background. There are several challenges associated with this task, such as interpolating font to unseen characters and preserving text size and the background. To address these, we introduce VTNet, a novel conditional diffusion-based method. To train the VTNet, we create a synthetic cross-lingual dataset of 600K samples of scene text images in six popular languages, including English, Hindi, Tamil, Chinese, Bengali, and German. We evaluate the performance of VTnet through extensive experiments and comparisons to related methods. Our model also surpasses the previous state-of-the-art results on the conventional scene-text editing benchmarks. Further, we present rigorous qualitative studies to understand the strengths and shortcomings of our model. Results show that our approach generalizes well to unseen words and fonts. We firmly believe our work can benefit real-world applications, such as text translation using a phone camera and translating educational materials. Code and data will be made publicly available.


Spanish Pre-trained BERT Model and Evaluation Data

arXiv.org Artificial Intelligence

The Spanish language is one of the top 5 spoken languages in the world. Nevertheless, finding resources to train or evaluate Spanish language models is not an easy task. In this paper we help bridge this gap by presenting a BERT-based language model pre-trained exclusively on Spanish data. As a second contribution, we also compiled several tasks specifically for the Spanish language in a single repository much in the spirit of the GLUE benchmark. By fine-tuning our pretrained Spanish model, we obtain better results compared to other BERT-based models pre-trained on multilingual corpora for most of the tasks, even achieving a new state-of-the-art on some of them. We have publicly released our model, the pre-training data, and the compilation of the Spanish benchmarks. The field of natural language processing (NLP) has made incredible progress in the last two years.