Machine Translation
Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23
Tsiamas, Ioannis, Gállego, Gerard I., Fonollosa, José A. R., Costa-jussà, Marta R.
Gállego et al. (2021); Zhao et al. (2022) aimed to Han et al. (2021) tackled the issue by projecting speech and text features In the past decade, the field of Speech Translation (ST) has seen significant advancements, mainly In our work, we tackle the issue of misaligned due to end-to-end models that directly translate speech and text encoder representations by adopting speech, offering a more efficient method compared the approach proposed by Le et al. (2023). Despite data availability challenges, recent on English ASR, wav2vec 2.0 (Baevski et al., progress has diminished the performance disparity 2020), and an MT foundation model fine-tuned between these approaches (Bentivogli et al., 2021; on multilingual MT (En-Xx), mBART50 (Tang Potapczyk and Przybysz, 2020; Inaguma et al., et al., 2020), as described in Section 2.1.
Text Style Transfer Back-Translation
Wei, Daimeng, Wu, Zhanglin, Shang, Hengchao, Li, Zongyao, Wang, Minghan, Guo, Jiaxin, Chen, Xiaoyu, Yu, Zhengzhe, Yang, Hao
Back Translation (BT) is widely used in the field of machine translation, as it has been proved effective for enhancing translation quality. However, BT mainly improves the translation of inputs that share a similar style (to be more specific, translation-like inputs), since the source side of BT data is machine-translated. For natural inputs, BT brings only slight improvements and sometimes even adverse effects. To address this issue, we propose Text Style Transfer Back Translation (TST BT), which uses a style transfer model to modify the source side of BT data. By making the style of source-side text more natural, we aim to improve the translation of natural inputs. Our experiments on various language pairs, including both high-resource and low-resource ones, demonstrate that TST BT significantly improves translation performance against popular BT benchmarks. In addition, TST BT is proved to be effective in domain adaptation so this strategy can be regarded as a general data augmentation method. Our training code and text style transfer model are open-sourced.
Automatic Translation of Hate Speech to Non-hate Speech in Social Media Texts
Kostiuk, Yevhen, Tonja, Atnafu Lambebo, Sidorov, Grigori, Kolesnikova, Olga
In this paper, we investigate the issue of hate speech by presenting a novel task of translating hate speech into non-hate speech text while preserving its meaning. As a case study, we use Spanish texts. We provide a dataset and several baselines as a starting point for further research in the task. We evaluated our baseline results using multiple metrics, including BLEU scores. The aim of this study is to contribute to the development of more effective methods for reducing the spread of hate speech in online communities.
Exploring Better Text Image Translation with Multimodal Codebook
Lan, Zhibin, Yu, Jiawei, Li, Xiang, Zhang, Wen, Luan, Jian, Wang, Bin, Huang, Degen, Su, Jinsong
Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research value. However, current studies on TIT are confronted with two main bottlenecks: 1) this task lacks a publicly available TIT dataset, 2) dominant models are constructed in a cascaded manner, which tends to suffer from the error propagation of optical character recognition (OCR). In this work, we first annotate a Chinese-English TIT dataset named OCRMT30K, providing convenience for subsequent studies. Then, we propose a TIT model with a multimodal codebook, which is able to associate the image with relevant texts, providing useful supplementary information for translation. Moreover, we present a multi-stage training framework involving text machine translation, image-text alignment, and TIT tasks, which fully exploits additional bilingual texts, OCR dataset and our OCRMT30K dataset to train our model. Extensive experiments and in-depth analyses strongly demonstrate the effectiveness of our proposed model and training framework.
Revisiting Non-Autoregressive Translation at Scale
Wang, Zhihao, Wang, Longyue, Su, Jinsong, Yao, Junfeng, Tu, Zhaopeng
In real-world systems, scaling has been critical for improving the translation quality in autoregressive translation (AT), which however has not been well studied for non-autoregressive translation (NAT). In this work, we bridge the gap by systematically studying the impact of scaling on NAT behaviors. Extensive experiments on six WMT benchmarks over two advanced NAT models show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance. To reduce the side-effect of scaling on decoding speed, we empirically investigate the impact of NAT encoder and decoder on the translation performance. Experimental results on the large-scale WMT20 En-De show that the asymmetric architecture (e.g. bigger encoder and smaller decoder) can achieve comparable performance with the scaling model, while maintaining the superiority of decoding speed with standard NAT models. To this end, we establish a new benchmark by validating scaled NAT models on the scaled dataset, which can be regarded as a strong baseline for future works. We release code and system outputs at https://github.com/DeepLearnXMU/Scaling4NAT.
Optimizing Non-Autoregressive Transformers with Contrastive Learning
An, Chenxin, Feng, Jiangtao, Huang, Fei, Qiu, Xipeng, Kong, Lingpeng
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. They have achieved remarkable progress in machine translation as well as many other applications. However, a long-standing challenge for NATs is the learning of multi-modality data distribution, which is the main cause of the performance gap between NATs and ATs. In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution. We derive contrastive constraints to stabilize the training process and integrate this resulting objective with the state-of-the-art NAT architecture DA-Transformer. Our model \method is examined on 3 different tasks, including machine translation, text summarization, and paraphrasing with 5 benchmarks. Results show that our approach outperforms previous non-autoregressive baselines by a significant margin and establishes new state-of-the-art results for non-autoregressive transformers on all the benchmarks.
Easy Guided Decoding in Providing Suggestions for Interactive Machine Translation
Wang, Ke, Ge, Xin, Wang, Jiayi, Zhao, Yu, Zhang, Yuqi
Machine translation technology has made great progress in recent years, but it cannot guarantee error free results. Human translators perform post editing on machine translations to correct errors in the scene of computer aided translation. In favor of expediting the post editing process, many works have investigated machine translation in interactive modes, in which machines can automatically refine the rest of translations constrained by human's edits. Translation Suggestion (TS), as an interactive mode to assist human translators, requires machines to generate alternatives for specific incorrect words or phrases selected by human translators. In this paper, we utilize the parameterized objective function of neural machine translation (NMT) and propose a novel constrained decoding algorithm, namely Prefix Suffix Guided Decoding (PSGD), to deal with the TS problem without additional training. Compared to the state of the art lexically constrained decoding method, PSGD improves translation quality by an average of $10.87$ BLEU and $8.62$ BLEU on the WeTS and the WMT 2022 Translation Suggestion datasets, respectively, and reduces decoding time overhead by an average of 63.4% tested on the WMT translation datasets. Furthermore, on both of the TS benchmark datasets, it is superior to other supervised learning systems trained with TS annotated data.
Learning the Relation between Code Features and Code Transforms with Structured Prediction
Yu, Zhongxing, Martinez, Matias, Chen, Zimin, Bissyandé, Tegawendé F., Monperrus, Martin
To effectively guide the exploration of the code transform space for automated code evolution techniques, we present in this paper the first approach for structurally predicting code transforms at the level of AST nodes using conditional random fields (CRFs). Our approach first learns offline a probabilistic model that captures how certain code transforms are applied to certain AST nodes, and then uses the learned model to predict transforms for arbitrary new, unseen code snippets. {Our approach involves a novel representation of both programs and code transforms. Specifically, we introduce the formal framework for defining the so-called AST-level code transforms and we demonstrate how the CRF model can be accordingly designed, learned, and used for prediction}. We instantiate our approach in the context of repair transform prediction for Java programs. Our instantiation contains a set of carefully designed code features, deals with the training data imbalance issue, and comprises transform constraints that are specific to code. We conduct a large-scale experimental evaluation based on a dataset of bug fixing commits from real-world Java projects. The results show that when the popular evaluation metric \emph{top-3} is used, our approach predicts the code transforms with an accuracy varying from 41\% to 53\% depending on the transforms. Our model outperforms two baselines based on history probability and neural machine translation (NMT), suggesting the importance of considering code structure in achieving good prediction accuracy. In addition, a proof-of-concept synthesizer is implemented to concretize some repair transforms to get the final patches. The evaluation of the synthesizer on the Defects4j benchmark confirms the usefulness of the predicted AST-level repair transforms in producing high-quality patches.
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Vilnis, Luke, Zemlyanskiy, Yury, Murray, Patrick, Passos, Alexandre, Sanghai, Sumit
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of Figure 1: Sequence model over sequences of length two our approach on WMT machine translation, and a vocabulary of three symbols mapping points in the more than halving the standard deviation when unit interval to each sequence. An even lattice of code estimating expected BLEU score reward, and points parallelizes decoding into diverse high-probability closing the BLEU score gap between independent sequences.
SQuId: Measuring Speech Naturalness in Many Languages
Sellam, Thibault, Bapna, Ankur, Camp, Joshua, Mackinnon, Diana, Parikh, Ankur P., Riesa, Jason
Much of text-to-speech research relies on human evaluation, which incurs heavy costs and slows down the development process. The problem is particularly acute in heavily multilingual applications, where recruiting and polling judges can take weeks. We introduce SQuId (Speech Quality Identification), a multilingual naturalness prediction model trained on over a million ratings and tested in 65 locales-the largest effort of this type to date. The main insight is that training one model on many locales consistently outperforms mono-locale baselines. We present our task, the model, and show that it outperforms a competitive baseline based on w2v-BERT and VoiceMOS by 50.0%. We then demonstrate the effectiveness of cross-locale transfer during fine-tuning and highlight its effect on zero-shot locales, i.e., locales for which there is no fine-tuning data. Through a series of analyses, we highlight the role of non-linguistic effects such as sound artifacts in cross-locale transfer. Finally, we present the effect of our design decision, e.g., model size, pre-training diversity, and language rebalancing with several ablation experiments.