Goto

Collaborating Authors

 Machine Translation


Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation System for the WMT22 Translation Task

arXiv.org Artificial Intelligence

This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task. We participate in the general translation task on English$\Leftrightarrow$Livonian. Our system is based on M2M100 with novel techniques that adapt it to the target language pair. (1) Cross-model word embedding alignment: inspired by cross-lingual word embedding alignment, we successfully transfer a pre-trained word embedding to M2M100, enabling it to support Livonian. (2) Gradual adaptation strategy: we exploit Estonian and Latvian as auxiliary languages for many-to-many translation training and then adapt to English-Livonian. (3) Data augmentation: to enlarge the parallel data for English-Livonian, we construct pseudo-parallel data with Estonian and Latvian as pivot languages. (4) Fine-tuning: to make the most of all available data, we fine-tune the model with the validation set and online back-translation, further boosting the performance. In model evaluation: (1) We find that previous work underestimated the translation performance of Livonian due to inconsistent Unicode normalization, which may cause a discrepancy of up to 14.9 BLEU score. (2) In addition to the standard validation set, we also employ round-trip BLEU to evaluate the models, which we find more appropriate for this task. Finally, our unconstrained system achieves BLEU scores of 17.0 and 30.4 for English to/from Livonian.


How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts

arXiv.org Artificial Intelligence

Neural Machine Translation systems built on top of Transformer-based architectures are routinely improving the state-of-the-art in translation quality according to word-overlap metrics. However, a growing number of studies also highlight the inherent gender bias that these models incorporate during training, which reflects poorly in their translations. In this work, we investigate whether these models can be instructed to fix their bias during inference using targeted, guided instructions as contexts. By translating relevant contextual sentences during inference along with the input, we observe large improvements in reducing the gender bias in translations, across three popular test suites (WinoMT, BUG, SimpleGen). We further propose a novel metric to assess several large pre-trained models (OPUS-MT, M2M-100) on their sensitivity towards using contexts during translation to correct their biases. Our approach requires no fine-tuning and thus can be used easily in production systems to de-bias translations from stereotypical gender-occupation bias 1. We hope our method, along with our metric, can be used to build better, bias-free translation systems.


Towards Robust k-Nearest-Neighbor Machine Translation

arXiv.org Artificial Intelligence

k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years. Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model. However, the underlying retrieved noisy pairs will dramatically deteriorate the model performance. In this paper, we conduct a preliminary study and find that this problem results from not fully exploiting the prediction of the NMT model. To alleviate the impact of noise, we propose a confidence-enhanced kNN-MT model with robust training. Concretely, we introduce the NMT confidence to refine the modeling of two important components of kNN-MT: kNN distribution and the interpolation weight. Meanwhile we inject two types of perturbations into the retrieved pairs for robust training. Experimental results on four benchmark datasets demonstrate that our model not only achieves significant improvements over current kNN-MT models, but also exhibits better robustness. Our code is available at https://github.com/DeepLearnXMU/Robust-knn-mt.


Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation

arXiv.org Artificial Intelligence

End-to-end Speech Translation (ST) aims at translating the source language speech into target language text without generating the intermediate transcriptions. However, the training of end-to-end methods relies on parallel ST data, which are difficult and expensive to obtain. Fortunately, the supervised data for automatic speech recognition (ASR) and machine translation (MT) are usually more accessible, making zero-shot speech translation a potential direction. Existing zero-shot methods fail to align the two modalities of speech and text into a shared semantic space, resulting in much worse performance compared to the supervised ST methods. In order to enable zero-shot ST, we propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text. Specifically, we introduce a vector quantization module to discretize the continuous representations of speech and text into a finite set of virtual tokens, and use ASR data to map corresponding speech and text to the same virtual token in a shared codebook. This way, source language speech can be embedded in the same semantic space as the source language text, which can be then transformed into target language text with an MT module. Experiments on multiple language pairs demonstrate that our zero-shot ST method significantly improves the SOTA, and even performers on par with the strong supervised ST baselines.


Pseudo-OOD training for robust language models

arXiv.org Artificial Intelligence

Motivated by the above limitations, we propose a framework called POsthoc pseudo Ood REgularization Detecting Out-of-Distribution (OOD) (Goodfellow (POORE) that generates pseudo-OOD data et al., 2014; Hendrycks and Gimpel, 2016; using the trained classifier and the In-Distribution Yang et al., 2021) samples is vital for developing (IND) samples. As opposed to methods that use reliable machine learning systems for various outlier exposure, our framework doesn't rely on any industry-scale applications of natural language understanding external OOD set. Moreover, POORE can be easily (NLP) (Shen et al., 2019; Sundararaman applied to already deployed large-scale models et al., 2020) including intent understanding trained on a classification task, without requiring in conversational dialogues (Zheng et al., 2020; to re-train the classifier from scratch. In summary, Li et al., 2017), language translation (Denkowski we make the following contributions: and Lavie, 2011; Sundararaman et al., 2019), and text classification (Aggarwal and Zhai, 2012; Sundararaman 1. We propose a Mahalanobis-based context et al., 2022). For instance, a language masking scheme for generating pseudo-OOD understanding model deployed to support a chat samples that can be used during the finetuning.


How Positional Encoding work(Transformers)part2

#artificialintelligence

Abstract: Adapting Deep Learning (DL) techniques to automate non-trivial coding activities, such as code documentation and defect detection, has been intensively studied recently. Learning to predict code changes is one of the popular and essential investigations. Prior studies have shown that DL techniques such as Neural Machine Translation (NMT) can benefit meaningful code changes, including bug fixing and code refactoring. However, NMT models may encounter bottleneck when modeling long sequences, thus are limited in accurately predicting code changes. In this work, we design a Transformer-based approach, considering that Transformer has proven effective in capturing long-term dependencies.


How to boost your business with natural language processing (NLP)

#artificialintelligence

Natural language processing (NLP) is a powerful combination of linguistics and computer science that, through the study of language and the creation of intelligent systems, makes human language as intelligible to machines as it would be for a human being, whether in text or speech format. As a branch of artificial intelligence (AI), NLP enables computers and machines to understand, interpret and manipulate human language using computational linguistics and statistical models, machine learning methods and deep learning processes. The knowledge extracted by these technologies is converted into algorithms that teach machines to perform a myriad of tasks that are infinitely valuable to businesses. The more data NLP algorithms receive, the more precise text analysis models become. NLP includes an immense diversity of techniques, from statistical and machine learning methods to algorithmic and rule-based approaches.


Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

arXiv.org Artificial Intelligence

Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image. Despite the promising performance, MMT models still suffer the problem of input degradation: models focus more on textual information while visual information is generally overlooked. In this paper, we endeavor to improve MMT performance by increasing visual awareness from an information theoretic perspective. In detail, we decompose the informative visual signals into two parts: source-specific information and target-specific information. We use mutual information to quantify them and propose two methods for objective optimization to better leverage visual signals. Experiments on two datasets demonstrate that our approach can effectively enhance the visual awareness of MMT model and achieve superior results against strong baselines.


RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech Translation without Quality Compromise

arXiv.org Artificial Intelligence

Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-the-art (SotA) results; yet, using such encoders is computationally expensive. To improve this, we present a novel Reducer Adaptor block, RedApt, that could be seamlessly integrated within any Transformer-based speech encoding architecture. Integrating the pretrained wav2vec 2 speech encoder with RedAptbrings 41% speedup, 33% memory reduction with 24% fewer FLOPs at inference. To our positive surprise, our ST model with RedApt outperforms the SotA architecture by an average of 0.68 BLEU score on 8 language pairs from Must-C.


Brief Review -- Unsupervised Machine Translation Using Monolingual Corpora Only

#artificialintelligence

With the use of GAN idea, NMT model can be trained without parallel data, in which I think it is similar to the CycleGAN in image domain. 2013 … 2018 [UMNT] … 2020 [Batch Augment, BA] [GPT-3] [T5]…