Machine Translation
Exploring Data Augmentation for Code Generation Tasks
Chen, Pinzhen, Lampouras, Gerasimos
Advances in natural language processing, such as transfer learning from pre-trained language models, have impacted how models are trained for programming language tasks too. Previous research primarily explored code pre-training and expanded it through multi-modality and multi-tasking, yet the data for downstream tasks remain modest in size. Focusing on data utilization for downstream tasks, we propose and adapt augmentation methods that yield consistent improvements in code translation and summarization by up to 6.9% and 7.5% respectively. Further analysis suggests that our methods work orthogonally and show benefits in output code style and numeric consistency. We also discuss test data imperfections.
"Unlocking the Potential of Machine Translation Through Dataset Training, Validation, andโฆ
The coronavirus pandemic has changed the way we live, work, and interact with each other. We've all had to make adjustments to the way we do things, including the way we shop. We're now seeing a shift towards contactless and digital payments, which has made it easier for us to stay safe and healthy while still being able to purchase the items we need. Contactless payments have become increasingly popular during the pandemic and offer a range of benefits. Not only are they faster, more convenient, and more secure than traditional payment methods, but they also provide an extra layer of protection from the virus.
Advances in Automatically Rating the Trustworthiness of Text Processing Services
Srivastava, Biplav, Lakkaraju, Kausik, Bernagozzi, Mariana, Valtorta, Marco
AI services are known to have unstable behavior when subjected to changes in data, models or users. Such behaviors, whether triggered by omission or commission, lead to trust issues when AI works with humans. The current approach of assessing AI services in a black box setting, where the consumer does not have access to the AI's source code or training data, is limited. The consumer has to rely on the AI developer's documentation and trust that the system has been built as stated. Further, if the AI consumer reuses the service to build other services which they sell to their customers, the consumer is at the risk of the service providers (both data and model providers). Our approach, in this context, is inspired by the success of nutritional labeling in food industry to promote health and seeks to assess and rate AI services for trust from the perspective of an independent stakeholder. The ratings become a means to communicate the behavior of AI systems so that the consumer is informed about the risks and can make an informed decision. In this paper, we will first describe recent progress in developing rating methods for text-based machine translator AI services that have been found promising with user studies. Then, we will outline challenges and vision for a principled, multi-modal, causality-based rating methodologies and its implication for decision-support in real-world scenarios like health and food recommendation.
CoNT: Contrastive Neural Text Generation
An, Chenxin, Feng, Jiangtao, Lv, Kai, Kong, Lingpeng, Qiu, Xipeng, Huang, Xuanjing
Recently, contrastive learning attracts increasing interests in neural text generation as a new solution to alleviate the exposure bias problem. It introduces a sequence-level training signal which is crucial to generation tasks that always rely on auto-regressive decoding. However, previous methods using contrastive learning in neural text generation usually lead to inferior performance. In this paper, we analyse the underlying reasons and propose a new Contrastive Neural Text generation framework, CoNT. CoNT addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects -- the construction of contrastive examples, the choice of the contrastive loss, and the strategy in decoding. We validate CoNT on five generation tasks with ten benchmarks, including machine translation, summarization, code comment generation, data-to-text generation and commonsense generation. Experimental results show that CoNT clearly outperforms the conventional training framework on all the ten benchmarks with a convincing margin. Especially, CoNT surpasses previous the most competitive contrastive learning method for text generation, by 1.50 BLEU on machine translation and 1.77 ROUGE-1 on summarization, respectively. It achieves new state-of-the-art on summarization, code comment generation (without external data) and data-to-text generation.
FiT: Parameter Efficient Few-shot Transfer Learning for Personalized and Federated Image Classification
Shysheya, Aliaksandra, Bronskill, John, Patacchiola, Massimiliano, Nowozin, Sebastian, Turner, Richard E
Modern deep learning systems are increasingly deployed in situations such as personalization and federated learning where it is necessary to support i) learning on small amounts of data, and ii) communication efficient distributed training protocols. In this work, we develop FiLM Transfer (FiT) which fulfills these requirements in the image classification setting by combining ideas from transfer learning (fixed pretrained backbones and fine-tuned FiLM adapter layers) and meta-learning (automatically configured Naive Bayes classifiers and episodic training) to yield parameter efficient models with superior classification accuracy at low-shot. The resulting parameter efficiency is key for enabling few-shot learning, inexpensive model updates for personalization, and communication efficient federated learning. We experiment with FiT on a wide range of downstream datasets and show that it achieves better classification accuracy than the leading Big Transfer (BiT) algorithm at low-shot and achieves state-of-the art accuracy on the challenging VTAB-1k benchmark, with fewer than 1% of the updateable parameters. Finally, we demonstrate the parameter efficiency and superior accuracy of FiT in distributed low-shot applications including model personalization and federated learning where model update size is an important performance metric.
A Survey of Active Learning for Natural Language Processing
Zhang, Zhisong, Strubell, Emma, Hovy, Eduard
In this work, we provide a literature review of active learning (AL) for its applications in natural language processing (NLP). In addition to a fine-grained categorization of query strategies, we also investigate several other important aspects of applying AL to NLP problems. These include AL for structured prediction tasks, annotation cost, model learning (especially Figure 1: Counts of AL (left) and "neural" (right) papers with deep neural models), and starting in the ACL Anthology over the past twenty years.
The unreasonable effectiveness of few-shot learning for machine translation
Garcia, Xavier, Bansal, Yamini, Cherry, Colin, Foster, George, Krikun, Maxim, Feng, Fangxiaoyu, Johnson, Melvin, Firat, Orhan
We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT'21 English - Chinese news translation task by only using five examples of English - Chinese parallel data at inference. Moreover, our approach in building these models does not necessitate joint multilingual training or back-translation, is conceptually simple and shows the potential to extend to the multilingual setting. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation -- we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.
Diffusion-based Image Translation using Disentangled Style and Content Representation
Our model can generate high-quality translation outputs using both text and image conditions. More results can be found in the experiment section. Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer which is not limited to the specific domains. Unfortunately, due to the stochastic nature of diffusion models, it is often difficult to maintain the original content of the image during the reverse diffusion. To address this, here we present a novel diffusion-based unsupervised image translation method, dubbed as DiffuseIT, using disentangled style and content representation. Specifically, inspired by the slicing Vision Transformer (Tumanyan et al., 2022), we extract intermediate keys of multihead self attention layer from ViT model and used them as the content preservation loss. Then, an image guided style transfer is performed by matching the [CLS] classification token from the denoised samples and target image, whereas additional CLIP loss is used for the text-driven style transfer. To further accelerate the semantic change during the reverse diffusion, we also propose a novel semantic divergence loss and resampling strategy. Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks. Image translation is a task in which the model receives an input image and converts it into a target domain. Early image translation approaches (Zhu et al., 2017; Park et al., 2020; Isola et al., 2017) were mainly designed for single domain translation, but soon extended to multi-domain translation (Choi et al., 2018; Lee et al., 2019).
User Study for Improving Tools for Bible Translation
Technology has increasingly become an integral part of the Bible translation process. Over time, both the translation process and relevant technology have evolved greatly. More recently, the field of Natural Language Processing (NLP) has made great progress in solving some problems previously thought impenetrable. Through this study we endeavor to better understand and communicate about a segment of the current landscape of the Bible translation process as it relates to technology and identify pertinent issues. We conduct several interviews with individuals working in different levels of the Bible translation process from multiple organizations to identify gaps and bottlenecks where technology (including recent advances in AI) could potentially play a pivotal role in reducing translation time and improving overall quality.
Visually Grounded Keyword Detection and Localisation for Low-Resource Languages
This study investigates the use of Visually Grounded Speech (VGS) models for keyword localisation in speech. The study focusses on two main research questions: (1) Is keyword localisation possible with VGS models and (2) Can keyword localisation be done cross-lingually in a real low-resource setting? Four methods for localisation are proposed and evaluated on an English dataset, with the best-performing method achieving an accuracy of 57%. A new dataset containing spoken captions in Yoruba language is also collected and released for cross-lingual keyword localisation. The cross-lingual model obtains a precision of 16% in actual keyword localisation and this performance can be improved by initialising from a model pretrained on English data. The study presents a detailed analysis of the model's success and failure modes and highlights the challenges of using VGS models for keyword localisation in low-resource settings.