Goto

Collaborating Authors

 Machine Translation


Multi-Task Networks With Universe, Group, and Task Feature Learning

arXiv.org Artificial Intelligence

We present methods for multi-task learning that take advantage of natural groupings of related tasks. Task groups may be defined along known properties of the tasks, such as task domain or language. Such task groups represent supervised information at the inter-task level and can be encoded into the model. We investigate two variants of neural network architectures that accomplish this, learning different feature spaces at the levels of individual tasks, task groups, as well as the universe of all tasks: (1) parallel architectures encode each input simultaneously into feature spaces at different levels; (2) serial architectures encode each input successively into feature spaces at different levels in the task hierarchy. We demonstrate the methods on natural language understanding (NLU) tasks, where a grouping of tasks into different task domains leads to improved performance on ATIS, Snips, and a large inhouse dataset.


On the Weaknesses of Reinforcement Learning for Neural Machine Translation

arXiv.org Artificial Intelligence

Reinforcement learning (RL) is frequently used to increase performance in text generation tasks, including machine translation (MT), notably through the use of Minimum Risk Training (MRT) and Generative Adversarial Networks (GAN). However, little is known about what and how these methods learn in the context of MT. We prove that one of the most common RL methods for MT does not optimize the expected reward, as well as show that other methods take an infeasibly long time to converge. In fact, our results suggest that RL practices in MT are likely to improve performance only where the pre-trained parameters are already close to yielding the correct translation. Our findings further suggest that observed gains may be due to effects unrelated to the training signal, but rather from changes in the shape of the distribution curve.


Compensating for NLP's Lack of Understanding

#artificialintelligence

The saying "a picture is worth a thousand words" does something of an injustice to the medium of language. It suggests that words are an inefficient form of communication when in fact the opposite is true. When humans use language to communicate, so much is left out because the speaker and listener share experience of the same world, which makes explicit statements about that shared world unnecessary in everyday speech. For example, if I say to you "the vase is on its side, rolling along the table," I don't need to also tell you that the vase is made of fragile stuff (it's a reasonable assumption that it is), or that the table doesn't have edges that will stop the vase's rolling, or that as a result the vase will likely roll off the table, or that gravity will make the vase to fall to the floor, which is hard and will therefore cause the fragile vase to shatter. It's enough for me to say "the vase is on its side, rolling along the table" for you to know the vase will likely smash to pieces unless someone intervenes.


How artificial intelligence is powering cybersecurity

#artificialintelligence

Technology progresses daily and with this progression come threats and risks to social, financial and economic life. Today, cyber-attackers have resorted to the use of automation to launch more frequent attacks on different businesses and corporations. While cyber-attackers are expending a lot of resources to launch more sophisticated attacks, many organizations still rely on manual efforts to gather internal security findings and contextualize them with external threat information. It was reported that carelessness of employee was the reason behind the ransomware attack in 51 percent of the cases. However, such outdated methods and strategies need to part way for AI because they use up a lot of time, in which cyber-attackers can successfully take advantage of vulnerabilities to breach systems and steal data.


Translationese in Machine Translation Evaluation

arXiv.org Artificial Intelligence

The term translationese has been used to describe the presence of unusual features of translated text. In this paper, we provide a detailed analysis of the adverse effects of translationese on machine translation evaluation results. Our analysis shows evidence to support differences in text originally written in a given language relative to translated text and this can potentially negatively impact the accuracy of machine translation evaluations. For this reason we recommend that reverse-created test data be omitted from future machine translation test sets. In addition, we provide a re-evaluation of a past high-profile machine translation evaluation claiming human-parity of MT, as well as analysis of the since re-evaluations of it. We find potential ways of improving the reliability of all three past evaluations. One important issue not previously considered is the statistical power of significance tests applied in past evaluations that aim to investigate human-parity of MT. Since the very aim of such evaluations is to reveal legitimate ties between human and MT systems, power analysis is of particular importance, where low power could result in claims of human parity that in fact simply correspond to Type II error. We therefore provide a detailed power analysis of tests used in such evaluations to provide an indication of a suitable minimum sample size of translations for such studies. Subsequently, since no past evaluation that aimed to investigate claims of human parity ticks all boxes in terms of accuracy and reliability, we rerun the evaluation of the systems claiming human parity. Finally, we provide a comprehensive check-list for future machine translation evaluation.


Sequence Generation: From Both Sides to the Middle

arXiv.org Artificial Intelligence

The encoder-decoder framework has achieved promising process for many sequence generation tasks, such as neural machine translation and text summarization. Such a framework usually generates a sequence token by token from left to right, hence (1) this autoregressive decoding procedure is time-consuming when the output sentence becomes longer, and (2) it lacks the guidance of future context which is crucial to avoid under translation. To alleviate these issues, we propose a synchronous bidirectional sequence generation (SBSG) model which predicts its outputs from both sides to the middle simultaneously. In the SBSG model, we enable the left-to-right (L2R) and right-to-left (R2L) generation to help and interact with each other by leveraging interactive bidirectional attention network. Experiments on neural machine translation (En-De, Ch-En, and En-Ro) and text summarization tasks show that the proposed model significantly speeds up decoding while improving the generation quality compared to the autoregressive Transformer.


Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation

arXiv.org Artificial Intelligence

Non-Autoregressive Transformer (NAT) aims to accelerate the Transformer model through discarding the autoregressive mechanism and generating target words independently, which fails to exploit the target sequential information. Over-translation and under-translation errors often occur for the above reason, especially in the long sentence translation scenario. In this paper, we propose two approaches to retrieve the target sequential information for NAT to enhance its translation ability while preserving the fast-decoding property. Firstly, we propose a sequence-level training method based on a novel reinforcement algorithm for NAT (Reinforce-NAT) to reduce the variance and stabilize the training procedure. Secondly, we propose an innovative Transformer decoder named FS-decoder to fuse the target sequential information into the top layer of the decoder. Experimental results on three translation tasks show that the Reinforce-NAT surpasses the baseline NAT system by a significant margin on BLEU without decelerating the decoding speed and the FS-decoder achieves comparable translation performance to the autoregressive Transformer with considerable speedup.


Universal Approximation of Input-Output Maps by Temporal Convolutional Nets

arXiv.org Machine Learning

There has been a recent shift in sequence-to-sequence modeling from recurrent network architectures to convolutional network architectures due to computational advantages in training and operation while still achieving competitive performance. For systems having limited long-term temporal dependencies, the approximation capability of recurrent networks is essentially equivalent to that of temporal convolutional nets (TCNs). We prove that TCNs can approximate a large class of input-output maps having approximately finite memory to arbitrary error tolerance. Furthermore, we derive quantitative approximation rates for deep ReLU TCNs in terms of the width and depth of the network and modulus of continuity of the original input-output map, and apply these results to input-output maps of systems that admit finite-dimensional state-space realizations (i.e., recurrent models).


Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning

arXiv.org Machine Learning

We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto can learn generalizable device placement policies that can be applied to any graph. We propose two key ideas in our approach: (1) we represent the policy as performing iterative placement improvements, rather than outputting a placement in one shot; (2) we use graph embeddings to capture relevant information about the structure of the computation graph, without relying on node labels for indexing. These ideas allow Placeto to train efficiently and generalize to unseen graphs. Our experiments show that Placeto requires up to 6.1x fewer training steps to find placements that are on par with or better than the best placements found by prior approaches. Moreover, Placeto is able to learn a generalizable placement policy for any given family of graphs, which can then be used without any retraining to predict optimized placements for unseen graphs from the same family. This eliminates the large overhead incurred by prior RL approaches whose lack of generalizability necessitates re-training from scratch every time a new graph is to be placed.


The Challenge of Open Source MT SDL

#artificialintelligence

The very large majority of open-source MT efforts fail because they do not consistently produce output that is equal to, or better than, any easily accessed public MT solution or because they cannot be deployed effectively. This is not to say that this is not possible, but the investments and long-term commitment required for success are often underestimated or simply not properly understood. A case can always be made for private systems that offer greater control and security, even if they are generally less accurate than public MT options. However, in the localization industry we see that if "free" MT solutions that are superior to an LSP-built system are available, translators will use them. We also find that for the few self-developed MT systems that do produce useful output quality, integration issues are often an impediment to deployment at enterprise scale and robustness. Some say that those who ignore the lessons of history are doomed to repeat errors.