Goto

Collaborating Authors

 Machine Translation


Hint-Based Training for Non-Autoregressive Machine Translation

arXiv.org Machine Learning

Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency. Non-AutoRegressive Translation (NART) models were proposed to reduce the inference time, but could only achieve inferior translation accuracy. In this paper, we proposed a novel approach to leveraging the hints from hidden states and word alignments to help the training of NART models. The results achieve significant improvement over previous NART models for the WMT14 En-De and De-En datasets and are even comparable to a strong LSTM-based ART baseline but one order of magnitude faster in inference.


Adaptive Scheduling for Multi-Task Learning

arXiv.org Machine Learning

To train neural machine translation models simultaneously on multiple tasks (languages), it is common to sample each task uniformly or in proportion to dataset sizes. As these methods offer little control over performance trade-offs, we explore different task scheduling approaches. We first consider existing non-adaptive techniques, then move on to adaptive schedules that over-sample tasks with poorer results compared to their respective baseline. As explicit schedules can be inefficient, especially if one task is highly over-sampled, we also consider implicit schedules, learning to scale learning rates or gradients of individual tasks instead. These techniques allow training multilingual models that perform better for low-resource language pairs (tasks with small amount of data), while minimizing negative effects on high-resource tasks.


Entity Projection via Machine Translation for Cross-Lingual NER

arXiv.org Artificial Intelligence

Although over 100 languages are supported by strong off-the-shelf machine translation systems, only a subset of them possess large annotated corpora for named entity recognition. Motivated by this fact, we leverage machine translation to improve annotation-projection approaches to cross-lingual named entity recognition. We propose a system that improves over prior entity-projection methods by: (a) leveraging machine translation systems twice: first for translating sentences and subsequently for translating entities; (b) matching entities based on orthographic and phonetic similarity; and (c) identifying matches based on distributional statistics derived from the dataset. Our approach improves upon current state-of-the-art methods for cross-lingual named entity recognition on 5 diverse languages by an average of 4.1 points. Further, our method achieves state-of-the-art F_1 scores for Armenian, outperforming even a monolingual model trained on Armenian source data.


Self-Attentional Models Application in Task-Oriented Dialogue Generation Systems

arXiv.org Machine Learning

Self-attentional models are a new paradigm for sequence modelling tasks which differ from common sequence modelling methods, such as recurrence-based and convolution-based sequence learning, in the way that their architecture is only based on the attention mechanism. Self-attentional models have been used in the creation of the state-of-the-art models in many NLP tasks such as neural machine translation, but their usage has not been explored for the task of training end-to- end task-oriented dialogue generation systems yet. In this study, we apply these models on the three different datasets for training task-oriented chatbots. Our finding shows that self-attentional models can be exploited to create end-to-end task-oriented chatbots which not only achieve higher evaluation scores compared to recurrence-based models, but also do so more efficiently.


The Unreasonable Effectiveness Of Neural Machine Translation: A Breakthrough In Temporal Expression Understanding

#artificialintelligence

Written by Rakesh Chada and Marcos Jimenez, data scientists at x.ai. At x.ai we strive to make pain associated with scheduling meetings a thing of the past. We've built a virtual assistant (it goes by the name of Amy or Andrew) who can be cc'd into your typical request to meet with people over email. Amy will "understand" the hand-over and just take it from there with your guests, following up with them to nail the time and location details for the meeting. Under the hood this means that Amy must automatically extract meeting-related pieces of information from your email and, mashing that up with your calendar and overall preferences, proceed to get your guests to agree to a time that works for you and them, plus gather whatever other details are needed for the meeting (phone conference number, meeting room, address, google hangout link, etc โ€ฆ). Now the hard, cool, data-science part. Amy "understanding" all the pieces of information from free-form human text presents us with a number of formidable and fascinating data science challenges. This is the realm of natural language processing (NLP), where recent strides in deep learning have made tackling these problems viable. The problem goes far beyond simply detecting words related to times and locations, or named entity recognition (NER).


Practical guide to Attention mechanism for NLU tasks

#artificialintelligence

Chatbots, virtual assistants, augmented analytic systems typically receive user queries such as "Find me an action movie by Steven Spielberg". The system should correctly detect the intent "find_movie" while filling the slots "genre" with value "action" and "directed_by" with value "Steven Spielberg". This is a Natural Language Understanding (NLU) task kown as Intent Classification & Slot Filling. State-of-the-art performance is typically obtained using recurrent neural network (RNN) based approaches, as well as by leveraging an encoder-decoder architecture with sequence-to-sequence models. In this article we demonstrate hands-on strategies for improving the performance even further by adding Attention mechanism.


Problems with automating translation of movie/TV show subtitles

arXiv.org Machine Learning

We present 27 problems encountered in automating the translation of movie/TV show subtitles. We categorize each problem in one of the three categories viz. problems directly related to textual translation, problems related to subtitle creation guidelines, and problems due to adaptability of machine translation (MT) engines. We also present the findings of a translation quality evaluation experiment where we share the frequency of 16 key problems. We show that the systems working at the frontiers of Natural Language Processing do not perform well for subtitles and require some post-processing solutions for redressal of these problems


Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

arXiv.org Artificial Intelligence

Abstractive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non-differentiable, metrics that globally assess the quality and relevance of the generated outputs. ROUGE, the most used summarization metric, is known to suffer from bias towards lexical similarity as well as from suboptimal accounting for fluency and readability of the generated abstracts. W e thus explore and propose alternative evaluation measures: the reported human-evaluation analysis shows that the proposed metrics, based on Question Answering, favorably compares to ROUGE - with the additional property of not requiring reference summaries. Training a RL-based model on these metrics leads to improvements (both in terms of human or automated metrics) over current approaches that use ROUGE as a reward.


On the Downstream Performance of Compressed Word Embeddings

arXiv.org Machine Learning

Compressing word embeddings is important for deploying NLP models in memory-constrained settings. However, understanding what makes compressed embeddings perform well on downstream tasks is challenging---existing measures of compression quality often fail to distinguish between embeddings that perform well and those that do not. We thus propose the eigenspace overlap score as a new measure. We relate the eigenspace overlap score to downstream performance by developing generalization bounds for the compressed embeddings in terms of this score, in the context of linear and logistic regression. We then show that we can lower bound the eigenspace overlap score for a simple uniform quantization compression method, helping to explain the strong empirical performance of this method. Finally, we show that by using the eigenspace overlap score as a selection criterion between embeddings drawn from a representative set we compressed, we can efficiently identify the better performing embedding with up to $2\times$ lower selection error rates than the next best measure of compression quality, and avoid the cost of training a model for each task of interest.


Generating Classical Chinese Poems from Vernacular Chinese

arXiv.org Artificial Intelligence

Classical Chinese poetry is a jewel in the treasure house of Chinese culture. Previous poem generation models only allow users to employ keywords to interfere the meaning of generated poems, leaving the dominion of generation to the model. In this paper, we propose a novel task of generating classical Chinese poems from vernacular, which allows users to have more control over the semantic of generated poems. We adapt the approach of unsupervised machine translation (UMT) to our task. We use segmentation-based padding and reinforcement learning to address under-translation and over-translation respectively. According to experiments, our approach significantly improve the perplexity and BLEU compared with typical UMT models. Furthermore, we explored guidelines on how to write the input vernacular to generate better poems. Human evaluation showed our approach can generate high-quality poems which are comparable to amateur poems.