Goto

Collaborating Authors

 Machine Translation


STEER: Unified Style Transfer with Expert Reinforcement

arXiv.org Artificial Intelligence

While text style transfer has many applications across natural language processing, the core premise of transferring from a single source style is unrealistic in a real-world setting. In this work, we focus on arbitrary style transfer: rewriting a text from an arbitrary, unknown style to a target style. We propose STEER: Unified Style Transfer with Expert Reinforcement, a unified frame-work developed to overcome the challenge of limited parallel data for style transfer. STEER involves automatically generating a corpus of style-transfer pairs using a product of experts during decoding. The generated offline data is then used to pre-train an initial policy before switching to online, off-policy reinforcement learning for further improvements via fine-grained reward signals. STEER is unified and can transfer to multiple target styles from an arbitrary, unknown source style, making it particularly flexible and efficient. Experimental results on a challenging dataset with text from a diverse set of styles demonstrate state-of-the-art results compared to competitive baselines. Remarkably, STEER outperforms the 175B parameter instruction-tuned GPT-3 on overall style transfer quality, despite being 226 times smaller in size. We also show STEER is robust, maintaining its style transfer capabilities on out-of-domain data, and surpassing nearly all baselines across various styles. The success of our method highlights the potential of RL algorithms when augmented with controllable decoding to overcome the challenge of limited data supervision.


Citance-Contextualized Summarization of Scientific Papers

arXiv.org Artificial Intelligence

Current approaches to automatic summarization of scientific papers generate informative summaries in the form of abstracts. However, abstracts are not intended to show the relationship between a paper and the references cited in it. We propose a new contextualized summarization approach that can generate an informative summary conditioned on a given sentence containing the citation of a reference (a so-called "citance"). This summary outlines the content of the cited paper relevant to the citation location. Thus, our approach extracts and models the citances of a paper, retrieves relevant passages from cited papers, and generates abstractive summaries tailored to each citance. We evaluate our approach using $\textbf{Webis-Context-SciSumm-2023}$, a new dataset containing 540K~computer science papers and 4.6M~citances therein.


Data Augmentation for Neural Machine Translation using Generative Language Model

arXiv.org Artificial Intelligence

Neural Machine Translation(NMT) is the task of Through experiments, we examine that appropriate converting a sentence written in a source language prompts can reduce the generation cost of the into a target language sentence by using a translation synthetic data and facilitate the easy transfer of model. NMT models usually require vast knowledge from large-scale language models. We amounts of parallel data for training, but highquality also validate the effectiveness of the proposed 3 parallel data is often scarce. Since generating prompts through measure the diversity of generated parallel synthetic data demands substantial time synthetic data by each method. Via comparing and cost, especially for low-resource languages or the diversity, we demonstrate that generating domains, the problem becomes particularly severe various data is a crucial factor in synthetic data in such cases.


Context Consistency between Training and Testing in Simultaneous Machine Translation

arXiv.org Artificial Intelligence

Simultaneous Machine Translation (SiMT) aims to yield a real-time partial translation with a monotonically growing the source-side context. However, there is a counterintuitive phenomenon about the context usage between training and testing: e.g., the wait-k testing model consistently trained with wait-k is much worse than that model inconsistently trained with wait-k' (k' is not equal to k) in terms of translation quality. To this end, we first investigate the underlying reasons behind this phenomenon and uncover the following two factors: 1) the limited correlation between translation quality and training (cross-entropy) loss; 2) exposure bias between training and testing. Based on both reasons, we then propose an effective training approach called context consistency training accordingly, which makes consistent the context usage between training and testing by optimizing translation quality and latency as bi-objectives and exposing the predictions to the model during the training. The experiments on three language pairs demonstrate our intuition: our system encouraging context consistency outperforms that existing systems with context inconsistency for the first time, with the help of our context consistency training approach.


Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for Cross-Lingual Machine Reading Comprehension

arXiv.org Artificial Intelligence

In cross-lingual language understanding, machine translation is often utilized to enhance the transferability of models across languages, either by translating the training data from the source language to the target, or from the target to the source to aid inference. However, in cross-lingual machine reading comprehension (MRC), it is difficult to perform a deep level of assistance to enhance cross-lingual transfer because of the variation of answer span positions in different languages. In this paper, we propose X-STA, a new approach for cross-lingual MRC. Specifically, we leverage an attentive teacher to subtly transfer the answer spans of the source language to the answer output space of the target. A Gradient-Disentangled Knowledge Sharing technique is proposed as an improved cross-attention block. In addition, we force the model to learn semantic alignments from multiple granularities and calibrate the model outputs with teacher guidance to enhance cross-lingual transferability. Experiments on three multi-lingual MRC datasets show the effectiveness of our method, outperforming state-of-the-art approaches.


Simple and Effective Input Reformulations for Translation

arXiv.org Artificial Intelligence

Foundation language models learn from their finetuning input context in different ways. In this paper, we reformulate inputs during finetuning for challenging translation tasks, leveraging model strengths from pretraining in novel ways to improve downstream performance. These reformulations are simple data level modifications, require no additional collection of training data or modification of data at inference time. They can be applied either on single language pair translation tasks or massively multilingual translation tasks. Experiments with these techniques demonstrate significant performance improvements up to $\textbf{3.5 chrF++ on the Flores200 translation benchmark}$. We hope our research accessibly improves finetuning data efficiency, enabling more effective training to scalably improve state-of-the-art performance. Our code is released $\href{https://github.com/bri25yu/LanguageModelExperimentation}{here}.$


Don't Overlook the Grammatical Gender: Bias Evaluation for Hindi-English Machine Translation

arXiv.org Artificial Intelligence

Neural Machine Translation (NMT) models, though state-of-the-art for translation, often reflect social biases, particularly gender bias. Existing evaluation benchmarks primarily focus on English as the source language of translation. For source languages other than English, studies often employ gender-neutral sentences for bias evaluation, whereas real-world sentences frequently contain gender information in different forms. Therefore, it makes more sense to evaluate for bias using such source sentences to determine if NMT models can discern gender from the grammatical gender cues rather than relying on biased associations. To illustrate this, we create two gender-specific sentence sets in Hindi to automatically evaluate gender bias in various Hindi-English (HI-EN) NMT systems. We emphasise the significance of tailoring bias evaluation test sets to account for grammatical gender markers in the source language.


Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation

arXiv.org Artificial Intelligence

Added toxicity in the context of translation refers to the fact of producing a translation output with more toxicity than there exists in the input. In this paper, we present MinTox which is a novel pipeline to identify added toxicity and mitigate this issue which works at inference time. MinTox uses a toxicity detection classifier which is multimodal (speech and text) and works in languages at scale. The mitigation method is applied to languages at scale and directly in text outputs. MinTox is applied to SEAMLESSM4T, which is the latest multimodal and massively multilingual machine translation system. For this system, MinTox achieves significant added toxicity mitigation across domains, modalities and language directions. MinTox manages to approximately filter out from 25% to 95% of added toxicity (depending on the modality and domain) while keeping translation quality.


Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation

arXiv.org Artificial Intelligence

When training a neural network, it will quickly memorise some source-target mappings from your dataset but never learn some others. Yet, memorisation is not easily expressed as a binary feature that is good or bad: individual datapoints lie on a memorisation-generalisation continuum. What determines a datapoint's position on that spectrum, and how does that spectrum influence neural models' performance? We address these two questions for neural machine translation (NMT) models. We use the counterfactual memorisation metric to (1) build a resource that places 5M NMT datapoints on a memorisation-generalisation map, (2) illustrate how the datapoints' surface-level characteristics and a models' per-datum training signals are predictive of memorisation in NMT, (3) and describe the influence that subsets of that map have on NMT systems' performance.


There's no Data Like Better Data: Using QE Metrics for MT Data Filtering

arXiv.org Artificial Intelligence

Quality Estimation (QE), the evaluation of machine translation output without the need of explicit references, has seen big improvements in the last years with the use of neural metrics. In this paper we analyze the viability of using QE metrics for filtering out bad quality sentence pairs in the training data of neural machine translation systems~(NMT). While most corpus filtering methods are focused on detecting noisy examples in collections of texts, usually huge amounts of web crawled data, QE models are trained to discriminate more fine-grained quality differences. We show that by selecting the highest quality sentence pairs in the training data, we can improve translation quality while reducing the training size by half. We also provide a detailed analysis of the filtering results, which highlights the differences between both approaches.