Machine Translation
How to stop AI from perpetuating harmful biases
Artificial Intelligence (AI) is already re-configuring the world in conspicuous ways. Data drives our global digital ecosystem, and AI technologies reveal patterns in data. Smartphones, smart homes, and smart cities influence how we live and interact, and AI systems are increasingly involved in recruitment decisions, medical diagnoses, and judicial verdicts. Whether this scenario is utopian or dystopian depends on your perspective. The potential risks of AI are enumerated repeatedly.
Discretized Bottleneck in VAE: Posterior-Collapse-Free Sequence-to-Sequence Learning
Zhao, Yang, Yu, Ping, Mahapatra, Suchismit, Su, Qinliang, Chen, Changyou
Variational autoencoders (VAEs) are important tools in end-to-end representation learning. VAEs can capture complex data distributions and have been applied extensively in many natural-language-processing (NLP) tasks. However, a common pitfall in sequence-to-sequence learning with VAEs is the posterior-collapse issue in latent space, wherein the model tends to ignore latent variables when a strong auto-regressive decoder is implemented. In this paper, we propose a principled approach to eliminate this issue by applying a discretized bottleneck in the latent space. Specifically, we impose a shared discrete latent space where each input is learned to choose a combination of shared latent atoms as its latent representation. Compared with VAEs employing continuous latent variables, our model endows more promising capability in modeling underlying semantics of discrete sequences and can thus provide more interpretative latent structures. Empirically, we demonstrate the efficiency and effectiveness of our model on a broad range of tasks, including language modeling, unaligned text style transfer, dialog response generation, and neural machine translation.
Could AI make language learning obsolete?
Perhaps we can expect an iPhone-like symphonic progression in models here? Many companies are throwing their hat into the translation technology ring. Web translation software is being surpassed by portable, state-of-the-art technology in the form of earpieces, hand-held devices and apps, all of which are enabling users to quickly navigate our multilingual world on-the-go. Most recently, American Airlines announced it is testing interpreter mode for Google Assistant to help communication between their employees and travellers who speak a different language. In recent years, artificial intelligence (AI) has drastically enhanced the accuracy and quality of foreign language translations โ allowing machines to help break down language barriers for customer service teams and tourists alike.
BLEU might be Guilty but References are not Innocent
Freitag, Markus, Grangier, David, Caswell, Isaac
The quality of automatic metrics for machine translation has been increasingly called into question, especially for high-quality systems. This paper demonstrates that, while choice of metric is important, the nature of the references is also critical. We study different methods to collect references and compare their value in automated evaluation by reporting correlation with human evaluation for a variety of systems and metrics. Motivated by the finding that typical references exhibit poor diversity, concentrating around translationese language, we develop a paraphrasing task for linguists to perform on existing reference translations, which counteracts this bias. Our method yields higher correlation with human judgment not only for the submissions of WMT 2019 English to German, but also for Back-translation and APE augmented MT output, which have been shown to have low correlation with automatic metrics using standard references. We demonstrate that our methodology improves correlation with all modern evaluation metrics we look at, including embedding-based methods. To complete this picture, we reveal that multi-reference BLEU does not improve the correlation for high quality output, and present an alternative multi-reference formulation that is more effective.
An In-depth Walkthrough on Evolution of Neural Machine Translation
Jagtap, Rohan, Dhage, Sudhir N.
Neural Machine Translation (NMT) methodologies have burgeoned from using simple feed-forward architectures to the state of the art; viz. BERT model. The use cases of NMT models have been broadened from just language translations to conversational agents (chatbots), abstractive text summarization, image captioning, etc. which have proved to be a gem in their respective applications. This paper aims to study the major trends in Neural Machine Translation, the state of the art models in the domain and a high level comparison between them.
Joint translation and unit conversion for end-to-end localization
Dinu, Georgiana, Mathur, Prashant, Federico, Marcello, Lauly, Stanislas, Al-Onaizan, Yaser
A variety of natural language tasks require processing of textual data which contains a mix of natural language and formal languages such as mathematical expressions. In this paper, we take unit conversions as an example and propose a data augmentation technique which leads to models learning both translation and conversion tasks as well as how to adequately switch between them for end-to-end localization.
Architecture for a multilingual Wikipedia
Wikipedia's vision is a world in which everyone can share in the sum of all knowledge. In its first two decades, this vision has been very unevenly achieved. One of the largest hindrances is the sheer number of languages Wikipedia needs to cover in order to achieve that goal. We argue that we need a new approach to tackle this problem more effectively, a multilingual Wikipedia where content can be shared between language editions. This paper proposes an architecture for a system that fulfills this goal. It separates the goal in two parts: creating and maintaining content in an abstract notation within a project called Abstract Wikipedia, and creating an infrastructure called Wikilambda that can translate this notation to natural language. Both parts are fully owned and maintained by the community, as is the integration of the results in the existing Wikipedia editions. This architecture will make more encyclopedic content available to more people in their own language, and at the same time allow more people to contribute knowledge and reach more people with their contributions, no matter what their respective language backgrounds. Additionally, Wikilambda will unlock a new type of knowledge asset people can share in through the Wikimedia projects, functions, which will vastly expand what people can do with knowledge from Wikimedia, and provide a new venue to collaborate and to engage the creativity of contributors from all around the world. These two projects will considerably expand the capabilities of the Wikimedia platform to enable every single human being to freely share in the sum of all knowledge.
Applying Cyclical Learning Rate to Neural Machine Translation
Lee, Choon Meng, Liu, Jianfeng, Peng, Wei
In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy for computer vision related convolutional networks and datasets, we explore how cyclical learning rate can be applied to train transformer-based neural networks for neural machine translation. From our carefully designed experiments, we show that the choice of optimizers and the associated cyclical learning rate policy can have a significant impact on the performance. In addition, we establish guidelines when applying cyclical learning rates to neural machine translation tasks. Thus with our work, we hope to raise awareness of the importance of selecting the right optimizers and the accompanying learning rate policy, at the same time, encourage further research into easy-to-use learning rate policies.
Neural Machine Translation with Imbalanced Classes
We cast neural machine translation (NMT) as a classification task in an autoregressive setting and analyze the limitations of both classification and autoregression components. Classifiers are known to perform better with balanced class distributions during training. Since the Zipfian nature of languages causes imbalanced classes, we explore the effect of class imbalance on NMT. We analyze the effect of vocabulary sizes on NMT performance and reveal an explanation for 'why' certain vocabulary sizes are better than others.
Machine Translation for User-Generated Content
A specific use case worth exploring in this regard is MT for User Generated Content (UGC). Because of the speed with which UGC (comments, feedback, reviews) is being created and the corresponding costs of its professional translation, many organizations turn to MT. Popular examples of such companies are Skype (in addition to text translation, Microsoft developed the Automatic Speech Recognition (ASR) for audio speech translation in Skype) and Facebook. The social network is aiming to solve the challenge of fine-tuning each system relating to a specific language pair, using neural machine translation (NMT) and benefiting from various contexts for translations. One solution that tackles this issue is the technology developed by Language I/O. It takes into account the client's glossaries and TMs, selects the best MT engine output and then improves on the results using cultural intelligence and/or human linguists who compare machine translations post-facto to ensure that their MT Optimizer engine learns over time.