Machine Translation
What Are The Risks And Benefits Of Artificial Intelligence?
What are the risks and benefits of artificial intelligence? It's a complicated topic, but I'll try to unpack a few key points here. Let's start with a quick definition: AI is the simulation of human intelligence by machines. Example of AI systems used regularly in developed countries include Amazon's Alexa, smart replies in Gmail, Chatbots, predictive searches in Google, and recommendations. At a baseline level, AI helps improve our everyday lives by solving pain points, streamlining processes, and advancing human knowledge.
Invariance-based Adversarial Attack on Neural Machine Translation Systems
Chaturvedi, Akshay, KP, Abijith, Garain, Utpal
Abstract--Recently, NLP models have been shown to be susceptible to adversarial attacks. In this paper, we explore adve rsarial attacks on neural machine translation (NMT) systems. Given a sentence in the source language, the goal of the proposed att ack is to change multiple words while ensuring that the predicte d translation remains unchanged. In order to choose the word from the source vocabulary, we propose a soft-attention bas ed technique. The experiments are conducted on two language pa irs: English-German (en-de) and English-French (en-fr) and two state-of-the-art NMT systems: BLSTM-based encoder-decod er with attention and Transformer . The proposed soft-attenti on based technique outperforms existing methods like HotFlip by a significant margin for all the conducted experiments The res ults demonstrate that state-of-the-art NMT systems are unable t o capture the semantics of the source language.
Self-Knowledge Distillation in Natural Language Processing
Since deep learning became a key player in natural language processing (NLP), many deep learning models have been showing remarkable performances in a variety of NLP tasks, and in some cases, they are even outperforming humans. Such high performance can be explained by efficient knowledge representation of deep learning models. While many methods have been proposed to learn more efficient representation, knowledge distillation from pretrained deep networks suggest that we can use more information from the soft target probability to train other neural networks. In this paper, we propose a new knowledge distillation method self-knowledge distillation, based on the soft target probabilities of the training model itself, where multimode information is distilled from the word embedding space right below the softmax layer. Due to the time complexity, our method approximates the soft target probabilities. In experiments, we applied the proposed method to two different and fundamental NLP tasks: language model and neural machine translation. The experiment results show that our proposed method improves performance on the tasks.
ACL 2019 Best Papers Announced
The Association for Computational Linguistics (ACL) held its 57th annual meeting July 28 to August 2 in Florence, Italy. Today, the ACL 2019 organizing committee announced its eight paper awards: Best Long Paper, Best Short Paper, Best Demo Paper, and five Outstanding Paper awards. The paper addresses the issue by sampling context words both from the ground truth sequence and the predicted sequence by a model during training. Researchers tested the approach on Chinese to English and WMT'14 English to German translation tasks, and achieved significant improvements on various datasets. Click here to read the full paper.
Bilingual Lexicon Induction through Unsupervised Machine Translation
Artetxe, Mikel, Labaka, Gorka, Agirre, Eneko
A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods. In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual embeddings, we use them to build a phrase-table, combine it with a language model, and use the resulting machine translation system to generate a synthetic parallel corpus, from which we extract the bilingual lexicon using statistical word alignment techniques. As such, our method can work with any word embedding and cross-lingual mapping technique, and it does not require any additional resource besides the monolingual corpus used to train the embeddings. When evaluated on the exact same cross-lingual embeddings, our proposed method obtains an average improvement of 6 accuracy points over nearest neighbor and 4 points over CSLS retrieval, establishing a new state-of-the-art in the standard MUSE dataset.
Microsoft Unveiled a New Language Translation Feature for Its HoloLens Holograms Digital Trends
Not only is it possible to have a fairly realistic holographic replica of yourself, but Microsoft has just shown that it is also possible to have that same replica speak in different languages, too. According to The Verge, on Wednesday, July 17, Microsoft provided a demo of this latest innovation during its keynote speech at the Microsoft Inspire partner conference in Las Vegas. Tom Warren of The Verge posted a video clip on YouTube of Microsoft's demonstration of the hologram's language translation capabilities. Microsoft's demonstration of the technology included Azure executive Julia White, a HoloLens 2 headset, and White's hologram. White's hologram began as a small green outline of a hologram that White could hold in her hand, but as soon as she uttered two simple words, "render keynote," the small hologram grew into a fully rendered, human-sized replica of White and immediately began delivering the keynote speech in Japanese, in a voice that still matched White's.
Lookahead Optimizer: k steps forward, 1 step back
Zhang, Michael R., Lucas, James, Hinton, Geoffrey, Ba, Jimmy
The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam, and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by \emph{looking ahead} at the sequence of "fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.
Hierarchical Sequence to Sequence Voice Conversion with Limited Data
Narayanan, Praveen, Chakravarty, Punarjay, Charette, Francois, Puskorius, Gint
We present a voice conversion solution using recurrent sequence to sequence modeling for DNNs. Our solution takes advantage of recent advances in attention based modeling in the fields of Neural Machine Translation (NMT), Text-to-Speech (TTS) and Automatic Speech Recognition (ASR). The problem consists of converting between voices in a parallel setting when {\it $<$source,target$>$} audio pairs are available. Our seq2seq architecture makes use of a hierarchical encoder to summarize input audio frames. On the decoder side, we use an attention based architecture used in recent TTS works. Since there is a dearth of large multispeaker voice conversion databases needed for training DNNs, we resort to training the network with a large single speaker dataset as an autoencoder. This is then adapted for the smaller multispeaker voice conversion datasets available for voice conversion. In contrast with other voice conversion works that use $F_0$, duration and linguistic features, our system uses mel spectrograms as the audio representation. Output mel frames are converted back to audio using a wavenet vocoder.
Ten Machine Learning Algorithms You Should Know to Become a Data Scientist
Let's say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question "Which fruits are red and round?" and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won't be red and round. So I will ask a question "Which fruits have red or yellow colour hints on them? " on red and round fruits and will ask "Which fruits are green and round?" on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition.
Task Selection Policies for Multitask Learning
One of the questions that arises when designing models that learn to solve multiple tasks simultaneously is how much of the available training budget should be devoted to each individual task. We refer to any formalized approach to addressing this problem (learned or otherwise) as a task selection policy. In this work we provide an empirical evaluation of the performance of some common task selection policies in a synthetic bandit-style setting, as well as on the GLUE benchmark for natural language understanding. We connect task selection policy learning to existing work on automated curriculum learning and off-policy evaluation, and suggest a method based on counterfactual estimation that leads to improved model performance in our experimental settings.