Goto

Collaborating Authors

Refining Source Representations with Relation Networks for Neural Machine Translation

arXiv.org Artificial Intelligence

Although neural machine translation (NMT) with the encoder-decoder framework has achieved great success in recent times, it still suffers from some drawbacks: RNNs tend to forget old information which is often useful in the current step and the encoder only operates over words without considering word relationship. To solve these problems, we introduce relation networks (RNs) to learn better representations of the source. In our method RNs are used to associate source words with each other so that the source representation can memorize all the source words and also contain the relationship between them. Then the source representations and all the relations are fed into the attention component together while decoding, with the main encoder-decoder architecture unchanged. Experiments on several data sets show that our method can improve the translation performance significantly over the conventional encoder-decoder model, and can even outperform the approach involving supervised syntactic knowledge.


Asynchronous Bidirectional Decoding for Neural Machine Translation

AAAI Conferences

The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neural networks for translation. Traditionally, the NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a left-to-right manner, leaving the target-side contexts generated from right to left unexploited during translation. In this paper, we equip the conventional attentional encoder-decoder NMT framework with a backward decoder, in order to explore bidirectional decoding for NMT. Attending to the hidden state sequence produced by the encoder, our backward decoder first learns to generate the target-side hidden state sequence from right to left. Then, the forward decoder performs translation in the forward direction, while in each translation prediction timestep, it simultaneously applies two attention models to consider the source-side and reverse target-side hidden states, respectively. With this new architecture, our model is able to fully exploit source- and target-side contexts to improve translation quality altogether. Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3.14 and 1.38 BLEU points, respectively. The source code of this work can be obtained from https://github.com/DeepLearnXMU/ABDNMT.


Modeling Event Background for If-Then Commonsense Reasoning Using Context-aware Variational Autoencoder

arXiv.org Artificial Intelligence

To facilitate this, Rashkin et al. (2018) build the Event2Mind dataset and Sap et al. (2018) present the Atomic dataset, mainly focus on nine If-Then reasoning types to describe causes, effects, intents and participant characteristic about events. Together with these datasets, a simple RNN-based encoder-decoder framework is proposed to conduct the If-Then reasoning. However, there still remains two challenging problems. First, as illustrated in Figure 1, given an event "PersonX finds a new job", the plausible feeling of PersonX about that event could be multiple (such as "needy/stressed out" and "relieved/joyful"). Previous work showed that for the one-to-many problem, conventional RNN-based encoder-decoder models tend to generate generic responses, rather than meaningful and specific answers (Li et al., 2016; Serban et al., 2016). Second, as a commonsense reasoning problem, rich background knowledge is necessary for generating reasonable inferences. For example, as shown in Figure 1, the feeling of PersonX upon the event "PersonX finds a new job" could be multiple. However, after given a context " PersonX was fired", the plausible inferences would be narrowed down to " needy" or " stressed out ". To better solve these problems, we propose a context-aware variational autoencoder (CWV AE) together with a two-stage training procedure.


Learning to Refine Source Representations for Neural Machine Translation

arXiv.org Artificial Intelligence

Neural machine translation (NMT) models generally adopt an encoder-decoder architecture for modeling the entire translation process. The encoder summarizes the representation of input sentence from scratch, which is potentially a problem if the sentence is ambiguous. When translating a text, humans often create an initial understanding of the source sentence and then incrementally refine it along the translation on the target side. Starting from this intuition, we propose a novel encoder-refiner-decoder framework, which dynamically refines the source representations based on the generated target-side information at each decoding step. Since the refining operations are time-consuming, we propose a strategy, leveraging the power of reinforcement learning models, to decide when to refine at specific decoding steps. Experimental results on both Chinese-English and English-German translation tasks show that the proposed approach significantly and consistently improves translation performance over the standard encoder-decoder framework. Furthermore, when refining strategy is applied, results still show reasonable improvement over the baseline without much decrease in decoding speed.


Deconvolution-Based Global Decoding for Neural Machine Translation

arXiv.org Artificial Intelligence

A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order. As the studies of linguistics have proved that language is not linear word sequence but sequence of complex structure, translation at each step should be conditioned on the whole target-side context. To tackle the problem, we propose a new NMT model that decodes the sequence with the guidance of its structural prediction of the context of the target sequence. Our model generates translation based on the structural prediction of the target-side context so that the translation can be freed from the bind of sequential order. Experimental results demonstrate that our model is more competitive compared with the state-of-the-art methods, and the analysis reflects that our model is also robust to translating sentences of different lengths and it also reduces repetition with the instruction from the target-side context for decoding.