Goto

Collaborating Authors

 Machine Translation


Modelling Bahdanau Attention using Election methods aided by Q-Learning

arXiv.org Machine Learning

Neural Machine Translation has lately gained a lot of "attention" with the advent of more and more sophisticated but drastically improved models. Attention mechanism has proved to be a boon in this direction by providing weights to the input words, making it easy for the decoder to identify words representing the present context. But by and by, as newer attention models with more complexity came into development, they involved large computation, making inference slow. In this paper, we have modelled the attention network using techniques resonating with social choice theory. Along with that, the attention mechanism, being a Markov Decision Process, has been represented by reinforcement learning techniques. Thus, we propose to use an election method ( k -Borda), fine-tuned using Q-learning, as a replacement for attention networks. The inference time for this network is less than a standard Bahdanau translator, and the results of the translation are comparable. This not only experimentally verifies the claims stated above but also helped provide a faster inference.


Instance-based Transfer Learning for Multilingual Deep Retrieval

arXiv.org Machine Learning

Perhaps the simplest type of multilingual transfer learning is instance-based transfer learning, in which data from the target language and the auxiliary languages are pooled, and a single model is learned from the pooled data. It is not immediately obvious when instance-based transfer learning will improve performance in this multilingual setting: for instance, a plausible conjecture is this kind of transfer learning would help only if the auxiliary languages were very similar to the target. Here we show that at large scale, this method is surprisingly effective, leading to positive transfer on all of 35 target languages we tested. We analyze this improvement and argue that the most natural explanation, namely direct vocabulary overlap between languages, only partially explains the performance gains: in fact, we demonstrate target-language improvement can occur after adding data from an auxiliary language with no vocabulary in common with the target. This surprising result is due to the effect of transitive vocabulary overlaps between pairs of auxiliary and target languages.


Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views

arXiv.org Machine Learning

In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. This is for example the case for multilingual collections where documents are not available in all languages. Some studies tackled this problem by assuming the existence of view generation functions to approximately complete the missing views; for example Machine Translation to translate documents into the missing languages. These functions generally require an external resource to be set and their quality has a direct impact on the performance of the learned multiview classifier over the completed training set. Our proposed approach addresses this problem by jointly learning the missing views and the multiview classifier using a tripartite game with two generators and a discriminator. Each of the generators is associated to one of the views and tries to fool the discriminator by generating the other missing view conditionally on the corresponding observed view. The discriminator then tries to identify if for an observation, one of its views is completed by one of the generators or if both views are completed along with its class. Our results on a subset of Reuters RCV1/RCV2 collections show that the discriminator achieves significant classification performance; and that the generators learn the missing views with high quality without the need of any consequent external resource.


Can Neural Networks Learn Symbolic Rewriting?

arXiv.org Artificial Intelligence

This work investigates if the current neural architectures are adequate for learning symbolic rewriting. Two kinds of data sets are proposed for this research -- one based on automated proofs and the other being a synthetic set of polynomial terms. The experiments with use of the current neural machine translation models are performed and its results are discussed. Ideas for extending this line of research are proposed and its relevance is motivated.


Google's New AI Milestone: Neural Machine Translation Engine Can Now Translate 103 Languages

#artificialintelligence

Neural Machine Translation (NMT), one of the most important topics in deep learning, has gained much attention from the industries and academia over the last few years. In order to create simple models out of the complex ones, tech giant Google has been doing a lot of innovations in the domain of human to machine and machine to human translations for quite a few years now. Back in 2017, the tech giant introduced a solution to use a simple Neural Machine Translation (NMT) model to translate between multiple languages where the researchers merged 12 language pairs into a single model. Models into three types which are many-to-one, one-to-many and many-to-many models. Recently, the researchers at Google AI Team built a more enhanced system for neural machine translation (NMT) and published a paper known as "Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges".


Microsoft Research Asia's Systems for WMT19

arXiv.org Machine Learning

Yingce Xia, Xu T an, Fei Tian, Fei Gao, Weicong Chen, Y ang Fan, Linyuan Gong, Yichong Leng, Renqian Luo, Yiren Wang, Lijun Wu, Jinhua Zhu, T ao Qin, Tie-Y an Liu Microsoft Research Asia Abstract We Microsoft Research Asia made submissions to 11 language directions in the WMT19 news translation tasks. We won the first place for 8 of the 11 directions and the second place for the other three. Our basic systems are built on Transformer, back translation and knowledge distillation. We integrate several of our rececent techniques to enhance the baseline systems: multi-agent dual learning (MADL), masked sequence-to-sequence pre-training (MASS), neural architecture optimization (NAO), and soft contextual data augmentation (SCA). 1 Introduction We participated in the WMT19 shared news translation task in 11 translation directions. We achieved first place for 8 directions: German English, German French, Chinese English, English Lithuanian, English Finnish, and Russian English, and three other directions were placed second (ranked by teams), which included Lithuanian English, Finnish English, and English Kazakh. Our basic systems are based on Transformer, back translation and knowledge distillation. We experimented with several techniques we proposed recently. In brief, the innovations we introduced are: Multi-agent dual learning (MADL) The core idea of dual learning is to leverage the duality between the primal task (mapping from domain X to domain Y) and dual task (mapping from domain Y to X) to boost the performances of both tasks. MADL (Wang et al., 2019) extends the dual learning (He et al., 2016; Xia et al., 2017a) framework by introducing multiple primal and dual models. It was integrated into our submitted systems for*Corresponding author.


Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation

arXiv.org Machine Learning

The quality of neural machine translation can be improved by leveraging additional monolingual resources to create synthetic training data. Source-side monolingual data can be (forward-)translated into the target language for self-training; target-side monolingual data can be back-translated. It has been widely reported that back-translation delivers superior results, but could this be due to artefacts in the test sets? W e perform a case study using French-English news translation task and separate test sets based on their original languages. W e show that forward translation delivers superior gains in terms of BLEU on sentences that were originally in the source language, complementing previous studies which show large improvements with back-translation on sentences that were originally in the target language. To better understand when and why forward and back-translation are effective, we study the role of domains, translationese, and noise. While translationese effects are well known to influence MT evaluation, we also find evidence that news data from different languages shows subtle domain differences, which is another explanation for varying performance on different portions of the test set. W e perform additional low-resource experiments which demonstrate that forward translation is more sensitive to the quality of the initial translation system than back-translation, and tends to perform worse in low-resource settings.


MLPerf Inference Benchmark

arXiv.org Machine Learning

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and four orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf implements a set of rules and practices to ensure comparability across systems with wildly differing architectures. In this paper, we present the method and design principles of the initial MLPerf Inference release. The first call for submissions garnered more than 600 inference-performance measurements from 14 organizations, representing over 30 systems that show a range of capabilities.


Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)

arXiv.org Machine Learning

Neural Machine Translation (NMT) is a deep learning model that prov ides a robust method for machine translation using recurrent neural ne tworks (RNNs). Originally proposed in [1], NMT relies primarily on an encoder-decoder ar chi-tecture that provides increased fluency over phrase-based sys tems. This was implemented successfully in [2] for fast, accurate use on very large datasets. However, it has been suggested that there is significant redundan cy in the current method of neural network parametrization [3], presenting t he opportunity for significant speedup. Tensor Train (TT) decomposition [4] is a method by which large tenso rs can be approximated by the product of a'train' of smaller matrices (see Section 2.2). 1 TTdecomposition has been proposed as a method of speeding up an d reducing the memory usage of machine translation systems with dense weight matrices by reducing the number of parameters required to describe the sy stem [3].


Machine Learning for Translation: What's the State of the Language Art? - ReadWrite

#artificialintelligence

A new batch of Machine Translation tools driven by Artificial Intelligence is already translating tens of millions of messages per day. Proprietary ML translation solutions from Google, Microsoft, and Amazon are in daily use. Facebook takes its road with open-source approaches. What works best for translating software, documentation, and natural language content? And where is the automation of AI-driven neural networks driving? William Mamane, Head of Digital Marketing at Tomedes, a professional language services agency, had been a skeptic of machine translation.