Machine Translation
Self-Regulated Interactive Sequence-to-Sequence Learning
Kreutzer, Julia, Riezler, Stefan
Not all types of supervision signals are created equal: Different types of feedback have different costs and effects on learning. We show how self-regulation strategies that decide when to ask for which kind of feedback from a teacher (or from oneself) can be cast as a learning-to-learn problem leading to improved cost-aware sequence-to-sequence learning. In experiments on interactive neural machine translation, we find that the self-regulator discovers an $\epsilon$-greedy strategy for the optimal cost-quality trade-off by mixing different feedback types including corrections, error markups, and self-supervision. Furthermore, we demonstrate its robustness under domain shift and identify it as a promising alternative to active learning.
Multiple Generative Models Ensemble for Knowledge-Driven Proactive Human-Computer Dialogue Agent
Dai, Zelin, Liu, Weitang, Zhang, Hao, Zhu, Minghao, Wang, Long
Multiple sequence to sequence models were used to establish an end-to-end multi-turns proactive dialogue generation agent, with the aid of data augmentation techniques and variant encoder-decoder structure designs. A rank-based ensemble approach was developed for boosting performance. Results indicate that our single model, in average, makes an obvious improvement in the terms of F1-score and BLEU over the baseline by 18.67% on the DuConv dataset. In particular, the ensemble methods further significantly outperform the baseline by 35.85%.
Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss
Jehl, Laura, Lawrence, Carolin, Riezler, Stefan
In many machine learning scenarios, supervision by gold labels is not available and consequently neural models cannot be trained directly by maximum likelihood estimation (MLE). In a weak supervision scenario, metric-augmented objectives can be employed to assign feedback to model outputs, which can be used to extract a supervision signal for training. We present several objectives for two separate weakly supervised tasks, machine translation and semantic parsing. We show that objectives should actively discourage negative outputs in addition to promoting a surrogate gold structure. This notion of bipolarity is naturally present in ramp loss objectives, which we adapt to neural models. We show that bipolar ramp loss objectives outperform other non-bipolar ramp loss objectives and minimum risk training (MRT) on both weakly supervised tasks, as well as on a supervised machine translation task. Additionally, we introduce a novel token-level ramp loss objective, which is able to outperform even the best sequence-level ramp loss on both weakly supervised tasks.
Multi-Task Networks With Universe, Group, and Task Feature Learning
Pentyala, Shiva, Liu, Mengwen, Dreyer, Markus
We present methods for multi-task learning that take advantage of natural groupings of related tasks. Task groups may be defined along known properties of the tasks, such as task domain or language. Such task groups represent supervised information at the inter-task level and can be encoded into the model. We investigate two variants of neural network architectures that accomplish this, learning different feature spaces at the levels of individual tasks, task groups, as well as the universe of all tasks: (1) parallel architectures encode each input simultaneously into feature spaces at different levels; (2) serial architectures encode each input successively into feature spaces at different levels in the task hierarchy. We demonstrate the methods on natural language understanding (NLU) tasks, where a grouping of tasks into different task domains leads to improved performance on ATIS, Snips, and a large inhouse dataset.
On the Weaknesses of Reinforcement Learning for Neural Machine Translation
Choshen, Leshem, Fox, Lior, Aizenbud, Zohar, Abend, Omri
Reinforcement learning (RL) is frequently used to increase performance in text generation tasks, including machine translation (MT), notably through the use of Minimum Risk Training (MRT) and Generative Adversarial Networks (GAN). However, little is known about what and how these methods learn in the context of MT. We prove that one of the most common RL methods for MT does not optimize the expected reward, as well as show that other methods take an infeasibly long time to converge. In fact, our results suggest that RL practices in MT are likely to improve performance only where the pre-trained parameters are already close to yielding the correct translation. Our findings further suggest that observed gains may be due to effects unrelated to the training signal, but rather from changes in the shape of the distribution curve.
Compensating for NLP's Lack of Understanding
The saying "a picture is worth a thousand words" does something of an injustice to the medium of language. It suggests that words are an inefficient form of communication when in fact the opposite is true. When humans use language to communicate, so much is left out because the speaker and listener share experience of the same world, which makes explicit statements about that shared world unnecessary in everyday speech. For example, if I say to you "the vase is on its side, rolling along the table," I don't need to also tell you that the vase is made of fragile stuff (it's a reasonable assumption that it is), or that the table doesn't have edges that will stop the vase's rolling, or that as a result the vase will likely roll off the table, or that gravity will make the vase to fall to the floor, which is hard and will therefore cause the fragile vase to shatter. It's enough for me to say "the vase is on its side, rolling along the table" for you to know the vase will likely smash to pieces unless someone intervenes.
How artificial intelligence is powering cybersecurity
Technology progresses daily and with this progression come threats and risks to social, financial and economic life. Today, cyber-attackers have resorted to the use of automation to launch more frequent attacks on different businesses and corporations. While cyber-attackers are expending a lot of resources to launch more sophisticated attacks, many organizations still rely on manual efforts to gather internal security findings and contextualize them with external threat information. It was reported that carelessness of employee was the reason behind the ransomware attack in 51 percent of the cases. However, such outdated methods and strategies need to part way for AI because they use up a lot of time, in which cyber-attackers can successfully take advantage of vulnerabilities to breach systems and steal data.
Translationese in Machine Translation Evaluation
Graham, Yvette, Haddow, Barry, Koehn, Philipp
The term translationese has been used to describe the presence of unusual features of translated text. In this paper, we provide a detailed analysis of the adverse effects of translationese on machine translation evaluation results. Our analysis shows evidence to support differences in text originally written in a given language relative to translated text and this can potentially negatively impact the accuracy of machine translation evaluations. For this reason we recommend that reverse-created test data be omitted from future machine translation test sets. In addition, we provide a re-evaluation of a past high-profile machine translation evaluation claiming human-parity of MT, as well as analysis of the since re-evaluations of it. We find potential ways of improving the reliability of all three past evaluations. One important issue not previously considered is the statistical power of significance tests applied in past evaluations that aim to investigate human-parity of MT. Since the very aim of such evaluations is to reveal legitimate ties between human and MT systems, power analysis is of particular importance, where low power could result in claims of human parity that in fact simply correspond to Type II error. We therefore provide a detailed power analysis of tests used in such evaluations to provide an indication of a suitable minimum sample size of translations for such studies. Subsequently, since no past evaluation that aimed to investigate claims of human parity ticks all boxes in terms of accuracy and reliability, we rerun the evaluation of the systems claiming human parity. Finally, we provide a comprehensive check-list for future machine translation evaluation.
Sequence Generation: From Both Sides to the Middle
Zhou, Long, Zhang, Jiajun, Zong, Chengqing, Yu, Heng
The encoder-decoder framework has achieved promising process for many sequence generation tasks, such as neural machine translation and text summarization. Such a framework usually generates a sequence token by token from left to right, hence (1) this autoregressive decoding procedure is time-consuming when the output sentence becomes longer, and (2) it lacks the guidance of future context which is crucial to avoid under translation. To alleviate these issues, we propose a synchronous bidirectional sequence generation (SBSG) model which predicts its outputs from both sides to the middle simultaneously. In the SBSG model, we enable the left-to-right (L2R) and right-to-left (R2L) generation to help and interact with each other by leveraging interactive bidirectional attention network. Experiments on neural machine translation (En-De, Ch-En, and En-Ro) and text summarization tasks show that the proposed model significantly speeds up decoding while improving the generation quality compared to the autoregressive Transformer.
Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation
Shao, Chenze, Feng, Yang, Zhang, Jinchao, Meng, Fandong, Chen, Xilin, Zhou, Jie
Non-Autoregressive Transformer (NAT) aims to accelerate the Transformer model through discarding the autoregressive mechanism and generating target words independently, which fails to exploit the target sequential information. Over-translation and under-translation errors often occur for the above reason, especially in the long sentence translation scenario. In this paper, we propose two approaches to retrieve the target sequential information for NAT to enhance its translation ability while preserving the fast-decoding property. Firstly, we propose a sequence-level training method based on a novel reinforcement algorithm for NAT (Reinforce-NAT) to reduce the variance and stabilize the training procedure. Secondly, we propose an innovative Transformer decoder named FS-decoder to fuse the target sequential information into the top layer of the decoder. Experimental results on three translation tasks show that the Reinforce-NAT surpasses the baseline NAT system by a significant margin on BLEU without decelerating the decoding speed and the FS-decoder achieves comparable translation performance to the autoregressive Transformer with considerable speedup.