Goto

Collaborating Authors

 Machine Translation


HW-TSC's Submission to the CCMT 2024 Machine Translation Tasks

arXiv.org Artificial Intelligence

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to machine translation tasks of the 20th China Conference on Machine Translation (CCMT 2024). We participate in the bilingual machine translation task and multi-domain machine translation task. For these two translation tasks, we use training strategies such as regularized dropout, bidirectional training, data diversification, forward translation, back translation, alternated training, curriculum learning, and transductive ensemble learning to train neural machine translation (NMT) models based on the deep Transformerbig architecture. Furthermore, to explore whether large language model (LLM) can effectively improve the translation quality of NMT models, we use supervised fine-tuning (SFT) to train llama2-13b as an Automatic post-editing (APE) model to improve the translation results of the NMT model on the multi-domain machine translation task. By using these plyometric strategies, our submission achieves a competitive result in the final evaluation.


Reviews: Decoding with Value Networks for Neural Machine Translation

Neural Information Processing Systems

This paper addresses one of the limitation of NMT, the so-called exposure bias, that results from the fact that each word is chosen greedily. For this, the authors build on standard technique of reinforcement learning and try to predict, for each outgoing transition of a given state, the expected reward that will be achieved if the system take this transition. The article is overall very clear and the proposed ideas quite appealing, even if many of the decisions seem quite ad hoc (e.g. More importantly, several implementation "details" are not specified. For instance, in Equation (6), the BLEU function is defined at the sentence level while in the actual BLEU metric is defined at the corpus level.


Reviews: Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation

Neural Information Processing Systems

Original Review: This work builds directly off of Transformer networks. They make two contributions to that kind of architecture. The first is to suggest running the encoder and decoder stacks layer by layer instead of running the encoder stack and passing information to the decoder stack. The second is to actually tie the weights of the encoder and decoder. Running a decoder layer right after its corresponding encoder layer processes (rather than running the next encoder layer) is also an interesting augmentation to Transformer networks.


Reviews: e-SNLI: Natural Language Inference with Natural Language Explanations

Neural Information Processing Systems

I think the idea of explicable models is worth pursuing, and this is a decent contribution to showing how one might do that. It is unfortunate that this work shows a huge tradeoff between models that perform at high levels and those that explain well (from 4.1 it seems like we can get good performance, but then can't generate correct explanations very often and from 4.2 we can generate correct explanations more often at the expense of good performance). It also seems disappointing that the BLEU scores in the PREDICT setting are already so close to the inter-annotator agreement even though they are not correct explanations very often; this seems to suggest that we really do need to rely on the percent correct given by human evaluation and that the BLEU scores are not very meaningful. This seems like a bottleneck for this resource being widely adopted. Nonetheless, these findings are a solid contribution and so is the data if others are willing to do human evaluation or work on a new automatic metric for a task like this.


Reviews: Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

Neural Information Processing Systems

Update after author response: Thanks for the detailed response! It's a strong submission and I vote for an accept. This paper aims to speed up the computation of the softmax over a large vocabulary, which is quite common in some NLP tasks like e.g., language modeling. Specifically, the proposed method formulates the problem into a nearest neighbor search in a small world graph, and applies a log time algorithm to find the approximate top K predictions. The resulting time complexity reduces to logarithmic in the vocabulary size in expectation, in contrast to the linear one in a standard softmax.


A test suite of prompt injection attacks for LLM-based machine translation

arXiv.org Artificial Intelligence

LLM-based NLP systems typically work by embedding their input data into prompt templates which contain instructions and/or in-context examples, creating queries which are submitted to a LLM, and then parsing the LLM response in order to generate the system outputs. Prompt Injection Attacks (PIAs) are a type of subversion of these systems where a malicious user crafts special inputs which interfere with the prompt templates, causing the LLM to respond in ways unintended by the system designer. Recently, Sun and Miceli-Barone proposed a class of PIAs against LLM-based machine translation. Specifically, the task is to translate questions from the TruthfulQA test suite, where an adversarial prompt is prepended to the questions, instructing the system to ignore the translation instruction and answer the questions instead. In this test suite, we extend this approach to all the language pairs of the WMT 2024 General Machine Translation task. Moreover, we include additional attack formats in addition to the one originally studied.


Computational design of target-specific linear peptide binders with TransformerBeta

arXiv.org Artificial Intelligence

The computational prediction and design of peptide binders targeting specific linear epitopes is crucial in biological and biomedical research, yet it remains challenging due to their highly dynamic nature and the scarcity of experimentally solved binding data. To address this problem, we built an unprecedentedly large-scale library of peptide pairs within stable secondary structures (beta sheets), leveraging newly available AlphaFold predicted structures. We then developed a machine learning method based on the Transformer architecture for the design of specific linear binders, in analogy to a language translation task. Our method, TransformerBeta, accurately predicts specific beta strand interactions and samples sequences with beta sheet-like molecular properties, while capturing interpretable physico-chemical interaction patterns. As such, it can propose specific candidate binders targeting linear epitope for experimental validation to inform protein design.


On Instruction-Finetuning Neural Machine Translation Models

arXiv.org Artificial Intelligence

In this work, we introduce instruction finetuning for Neural Machine Translation (NMT) models, which distills instruction following capabilities from Large Language Models (LLMs) into orders-of-magnitude smaller NMT models. Our instruction-finetuning recipe for NMT models enables customization of translations for a limited but disparate set of translation-specific tasks. We show that NMT models are capable of following multiple instructions simultaneously and demonstrate capabilities of zero-shot composition of instructions. We also show that through instruction finetuning, traditionally disparate tasks such as formality-controlled machine translation, multi-domain adaptation as well as multi-modal translations can be tackled jointly by a single instruction finetuned NMT model, at a performance level comparable to LLMs such as GPT-3.5-Turbo. To the best of our knowledge, our work is among the first to demonstrate the instruction-following capabilities of traditional NMT models, which allows for faster, cheaper and more efficient serving of customized translations.


Neural machine translation system for Lezgian, Russian and Azerbaijani languages

arXiv.org Artificial Intelligence

We release the first neural machine translation system for translation between Russian, Azerbaijani and the endangered Lezgian languages, as well as monolingual and parallel datasets collected and aligned for training and evaluating the system. Multiple experiments are conducted to identify how different sets of training language pairs and data domains can influence the resulting translation quality. We achieve BLEU scores of 26.14 for Lezgian-Azerbaijani, 22.89 for Azerbaijani-Lezgian, 29.48 for Lezgian-Russian and 24.25 for Russian-Lezgian pairs. The quality of zero-shot translation is assessed on a Large Language Model, showing its high level of fluency in Lezgian. However, the model often refuses to translate, justifying itself with its incompetence. We contribute our translation model along with the collected parallel and monolingual corpora and sentence encoder for the Lezgian language.


Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics

arXiv.org Artificial Intelligence

Machine Translation (MT) evaluation metrics assess translation quality automatically. Recently, researchers have employed MT metrics for various new use cases, such as data filtering and translation re-ranking. However, most MT metrics return assessments as scalar scores that are difficult to interpret, posing a challenge to making informed design choices. Moreover, MT metrics' capabilities have historically been evaluated using correlation with human judgment, which, despite its efficacy, falls short of providing intuitive insights into metric performance, especially in terms of new metric use cases. To address these issues, we introduce an interpretable evaluation framework for MT metrics. Within this framework, we evaluate metrics in two scenarios that serve as proxies for the data filtering and translation re-ranking use cases. Furthermore, by measuring the performance of MT metrics using Precision, Recall, and F-score, we offer clearer insights into their capabilities than correlation with human judgments. Finally, we raise concerns regarding the reliability of manually curated data following the Direct Assessments+Scalar Quality Metrics (DA+SQM) guidelines, reporting a notably low agreement with Multidimensional Quality Metrics (MQM) annotations.