Machine Translation
Amortized Context Vector Inference for Sequence-to-Sequence Networks
Chatzis, Sotirios, Charalampous, Aristotelis, Tolias, Kyriacos, Vassou, Sotiris A.
Neural attention (NA) is an effective mechanism for inferring complex structural data dependencies that span long temporal horizons. As a consequence, it has become a key component of sequence-to-sequence models that yield state-of-the-art performance in as hard tasks as abstractive document summarization (ADS), machine translation (MT), and video captioning (VC). NA mechanisms perform inference of context vectors; these constitute weighted sums of deterministic input sequence encodings, adaptively sourced over long temporal horizons. However, recent work in the field of amortized variational inference (AVI) has shown that it is often useful to treat the representations generated by deep networks as latent random variables. This allows for the models to better explore the space of possible representations. Based on this motivation, in this work we introduce a novel regard towards a popular NA mechanism, namely soft-attention (SA). Our approach treats the context vectors generated by SA models as latent variables, the posteriors of which are inferred by employing AVI. Both the means and the covariance matrices of the inferred posteriors are parameterized via deep network mechanisms similar to those employed in the context of standard SA. To illustrate our method, we implement it in the context of popular sequence-to-sequence model variants with SA. We conduct an extensive experimental evaluation using challenging ADS, VC, and MT benchmarks, and show how our approach compares to the baselines.
Overhyping #AI #doctors, #language #translation goes open source, and new #jobs on the cards - Walker TechArts
Source: Overhyping AI doctors, language translation goes open source, and new jobs on the cards โข The Register. Here's a quick roundup to keep you updated on what's been happening in AI, beyond what we've already covered, for your long weekend. It includes news of Samsung and Qualcomm setting up new AI research teams, why human radiologists are still better than machines and support for Amazon's Keras-MXNet backend. Hold your horses AI radiologists People are quick to believe that machines will soon replace radiologists because they think computers are much better at spotting abnormalities like tumors or clots in medical scans. But results reported by Stanford University shows that radiologists still trump AI.
Fast Locality Sensitive Hashing for Beam Search on GPU
Shi, Xing, Xu, Shizhen, Knight, Kevin
We present a GPU-based Locality Sensitive Hashing (LSH) algorithm to speed up beam search for sequence models. We utilize the winner-take-all (WTA) hash, which is based on relative ranking order of hidden dimensions and thus resilient to perturbations in numerical values. Our algorithm is designed by fully considering the underling architecture of CUDA-enabled GPUs (Algorithm/Architecture Co-design): 1) A parallel Cuckoo hash table is applied for LSH code lookup (guaranteed O(1) lookup time); 2) Candidate lists are shared across beams to maximize the parallelism; 3) Top frequent words are merged into candidate lists to improve performance. Experiments on 4 large-scale neural machine translation models demonstrate that our algorithm can achieve up to 4x speedup on softmax module, and 2x overall speedup without hurting BLEU on GPU.
Shared Task - The 2nd Workshop on Neural Machine Translation and Generation
Efficiency track: We will have a track where the models that perform at least as well as the baseline attempt to create the most efficient implementation. Here, the winner will be the system that achieves a baseline BLEU score with the highest efficiency, memory or computational. Accuracy track: We will have a track where models that are at least as efficient as the baseline attempt to improve the BLEU score. Here, the winner will be the system that can improve accuracy the most without a decrease in efficiency. Efficiency track: We will have a track where the models that perform at least as well as the baseline attempt to create the most efficient implementation.
A Survey of Domain Adaptation for Neural Machine Translation
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.
Natural Language Generation for Electronic Health Records
A variety of methods existing for generating synthetic electronic health records (EHRs), but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness or progress notes. Here, we use the encoder-decoder model, a deep learning algorithm that features in many contemporary machine translation systems, to generate synthetic chief complaints from discrete variables in EHRs, like age group, gender, and discharge diagnosis. After being trained end-to-end on authentic records, the model can generate realistic chief complaint text that preserves much of the epidemiological information in the original data. As a side effect of the model's optimization goal, these synthetic chief complaints are also free of relatively uncommon abbreviation and misspellings, and they include none of the personally-identifiable information (PII) that was in the training data, suggesting it may be used to support the de-identification of text in EHRs. When combined with algorithms like generative adversarial networks (GANs), our model could be used to generate fully-synthetic EHRs, facilitating data sharing between healthcare providers and researchers and improving our ability to develop machine learning methods tailored to the information in healthcare data. 1 Introduction The wide adoption of electronic health record (EHR) systems has led to the creation of large amounts of healthcare data. Although these data are primarily used to improve patient outcomes and streamline the delivery of care (healthit.gov), Because they contain personally identifiable patient information, however, much of which is protected under the Health Insurance Portability and Accountability Act (HIPAA), these data are often difficult for providers to share with investigators outside their organizations, limiting their feasibility for use in research.
A Stochastic Decoder for Neural Machine Translation
Schulz, Philip, Aziz, Wilker, Cohn, Trevor
The process of translation is ambiguous, in that there are typically many valid trans- lations for a given sentence. This gives rise to significant variation in parallel cor- pora, however, most current models of machine translation do not account for this variation, instead treating the prob- lem as a deterministic process. To this end, we present a deep generative model of machine translation which incorporates a chain of latent variables, in order to ac- count for local lexical and syntactic varia- tion in parallel corpora. We provide an in- depth analysis of the pitfalls encountered in variational inference for training deep generative models. Experiments on sev- eral different language pairs demonstrate that the model consistently improves over strong baselines.
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
Kreutzer, Julia, Uyheng, Joshua, Riezler, Stefan
We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT). We investigate the reliability of human bandit feedback, and analyze the influence of reliability on the learnability of a reward estimator, and the effect of the quality of reward estimates on the overall RL task. Our analysis of cardinal (5-point ratings) and ordinal (pairwise preferences) feedback shows that their intra- and inter-annotator $\alpha$-agreement is comparable. Best reliability is obtained for standardized cardinal feedback, and cardinal feedback is also easiest to learn and generalize from. Finally, improvements of over 1 BLEU can be obtained by integrating a regression-based reward estimator trained on cardinal feedback for 800 translations into RL for NMT. This shows that RL is possible even from small amounts of fairly reliable human feedback, pointing to a great potential for applications at larger scale.
Deep Graph Translation
Guo, Xiaojie, Wu, Lingfei, Zhao, Liang
Inspired by the tremendous success of deep generative models on generating continuous data like image and audio, in the most recent year, few deep graph generative models have been proposed to generate discrete data such as graphs. They are typically unconditioned generative models which has no control on modes of the graphs being generated. Differently, in this paper, we are interested in a new problem named \emph{Deep Graph Translation}: given an input graph, we want to infer a target graph based on their underlying (both global and local) translation mapping. Graph translation could be highly desirable in many applications such as disaster management and rare event forecasting, where the rare and abnormal graph patterns (e.g., traffic congestions and terrorism events) will be inferred prior to their occurrence even without historical data on the abnormal patterns for this graph (e.g., a road network or human contact network). To achieve this, we propose a novel Graph-Translation-Generative Adversarial Networks (GT-GAN) which will generate a graph translator from input to target graphs. GT-GAN consists of a graph translator where we propose new graph convolution and deconvolution layers to learn the global and local translation mapping. A new conditional graph discriminator has also been proposed to classify target graphs by conditioning on input graphs. Extensive experiments on multiple synthetic and real-world datasets demonstrate the effectiveness and scalability of the proposed GT-GAN.
Refining Source Representations with Relation Networks for Neural Machine Translation
Zhang, Wen, Hu, Jiawei, Feng, Yang, Liu, Qun
Although neural machine translation (NMT) with the encoder-decoder framework has achieved great success in recent times, it still suffers from some drawbacks: RNNs tend to forget old information which is often useful in the current step and the encoder only operates over words without considering word relationship. To solve these problems, we introduce relation networks (RNs) to learn better representations of the source. In our method RNs are used to associate source words with each other so that the source representation can memorize all the source words and also contain the relationship between them. Then the source representations and all the relations are fed into the attention component together while decoding, with the main encoder-decoder architecture unchanged. Experiments on several data sets show that our method can improve the translation performance significantly over the conventional encoder-decoder model, and can even outperform the approach involving supervised syntactic knowledge.