Neural machine translation is a recently proposed paradigm A major technical challenge, other than designing such a in machine translation, where a single neural network, often neural machine translation system, is the scale of a training consisting of encoder and decoder recurrent networks, parallel corpus which often consists of hundreds of thousands is trained end-to-end to map from a source sentence to its to millions of sentence pairs. We address this issue by incorporating corresponding translation(Bahdanau, Cho, and Bengio 2014; an off-the-shelf black-box search engine into the Cho et al. 2014; Sutskever, Vinyals, and Le 2014; Kalchbrenner proposed neural machine translation system. The proposed and Blunsom 2013). The success of neural machine approach first queries a search engine, which indexes a whole translation, which has already been adopted by major training set, with a given source sentence, and the proposed industry players in machine translation(Wu et al. 2016; neural translation system translates the source sentence while Crego et al. 2016), is often attributed to the advances in building incorporating all the retrieved training sentence pairs. In this and training recurrent networks as well as the availability way, the proposed translation system automatically adapts to of large-scale parallel corpora for machine translation.
Wang, Yijun (University of Science and Technology of China) | Xia, Yingce (University of Science and Technology of China) | Zhao, Li (Microsoft Research Asia) | Bian, Jiang (Microsoft Research Asia) | Qin, Tao (Microsoft Research Asia) | Liu, Guiquan (University of Science and Technology of China) | Liu, Tie-Yan (Microsoft Research Asia)
Neural machine translation (NMT) heavily relies on parallel bilingual data for training. Since large-scale, high-quality parallel corpora are usually costly to collect, it is appealing to exploit monolingual corpora to improve NMT. Inspired by the law of total probability, which connects the probability of a given target-side monolingual sentence to the conditional probability of translating from a source sentence to the target one, we propose to explicitly exploit this connection to learn from and regularize the training of NMT models using monolingual data. The key technical challenge of this approach is that there are exponentially many source sentences for a target monolingual sentence while computing the sum of the conditional probability given each possible source sentence. We address this challenge by leveraging the dual translation model (target-to-source translation) to sample several mostly likely source-side sentences and avoid enumerating all possible candidate source sentences. That is, we transfer the knowledge contained in the dual model to boost the training of the primal model (source-to-target translation), and we call such an approach dual transfer learning. Experiment results on English-French and German-English tasks demonstrate that dual transfer learning achieves significant improvement over several strong baselines and obtains new state-of-the-art results.
Existing document-level neural machine translation (NMT) models have sufficiently explored different context settings to provide guidance for target generation. However, little attention is paid to inaugurate more diverse context for abundant context information. In this paper, we propose a Selective Memory-augmented Neural Document Translation model to deal with documents containing large hypothesis space of the context. Specifically, we retrieve similar bilingual sentence pairs from the training corpus to augment global context and then extend the two-stream attention model with selective mechanism to capture local context and diverse global contexts. This unified approach allows our model to be trained elegantly on three publicly document-level machine translation datasets and significantly outperforms previous document-level NMT models.
There are thousands of languages on earth, but visual perception is shared among peoples. Existing multimodal neural machine translation (MNMT) methods achieve knowledge transfer by enforcing one encoder to learn shared representation across textual and visual modalities. However, the training and inference process heavily relies on well-aligned bilingual sentence - image triplets as input, which are often limited in quantity. In this paper, we hypothesize that visual imagination via synthesizing visual representation from source text could help the neural model map two languages with different symbols, thus helps the translation task. Our proposed end-to-end imagination-based machine translation model (ImagiT) first learns to generate semantic-consistent visual representation from source sentence, and then generate target sentence based on both text representation and imagined visual representation. Experiments demonstrate that our translation model benefits from visual imagination and significantly outperforms the text-only neural machine translation (NMT) baseline. We also conduct analyzing experiments, and the results show that imagination can help fill in missing information when performing the degradation strategy.
Attacking Neural Machine Translation models is an inherently combinatorial task on discrete sequences, solved with approximate heuristics. Most methods use the gradient to attack the model on each sample independently. Instead of mechanically applying the gradient, could we learn to produce meaningful adversarial attacks ? In contrast to existing approaches, we learn to attack a model by training an adversarial generator based on a language model. We propose the Masked Adversarial Generation (MAG) model, that learns to perturb the translation model throughout the training process. The experiments show that it improves the robustness of machine translation models, while being faster than competing methods.