Overview
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models
Zhang, Minjia, Wang, Wenhan, Liu, Xiaodong, Gao, Jianfeng, He, Yuxiong
Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks. However, NLMs are very computationally demanding largely due to the computational cost of the decoding process, which consists of a softmax layer over a large vocabulary.We observe that in the decoding of many NLP tasks, only the probabilities of the top-K hypotheses need to be calculated preciously and K is often much smaller than the vocabulary size. This paper proposes a novel softmax layer approximation algorithm, called Fast Graph Decoder (FGD), which quickly identifies, for a given context, a set of K words that are most likely to occur according to a NLM. We demonstrate that FGD reduces the decoding time by an order of magnitude while attaining close to the full softmax baseline accuracy on neural machine translation and language modeling tasks. We also prove the theoretical guarantee on the softmax approximation quality.
Adversarial Text Generation via Feature-Mover's Distance
Chen, Liqun, Dai, Shuyang, Tao, Chenyang, Zhang, Haichao, Gan, Zhe, Shen, Dinghan, Zhang, Yizhe, Wang, Guoyin, Zhang, Ruiyi, Carin, Lawrence
Generative adversarial networks (GANs) have achieved significant success in generating real-valued data. However, the discrete nature of text hinders the application of GAN to text-generation tasks. Instead of using the standard GAN objective, we propose to improve text-generation GAN via a novel approach inspired by optimal transport. Specifically, we consider matching the latent feature distributions of real and synthetic sentences using a novel metric, termed the feature-mover's distance (FMD). This formulation leads to a highly discriminative critic and easy-to-optimize objective, overcoming the mode-collapsing and brittle-training problems in existing methods. Extensive experiments are conducted on a variety of tasks to evaluate the proposed model empirically, including unconditional text generation, style transfer from non-parallel text, and unsupervised cipher cracking. The proposed model yields superior performance, demonstrating wide applicability and effectiveness.
SimplE Embedding for Link Prediction in Knowledge Graphs
Kazemi, Seyed Mehran, Poole, David
Knowledge graphs contain knowledge about the world and provide a structured representation of this knowledge. Current knowledge graphs contain only a small subset of what is true in the world. Link prediction approaches aim at predicting new links for a knowledge graph given the existing links among the entities. Tensor factorization approaches have proved promising for such link prediction problems. Proposed in 1927, Canonical Polyadic (CP) decomposition is among the first tensor factorization approaches. CP generally performs poorly for link prediction as it learns two independent embedding vectors for each entity, whereas they are really tied. We present a simple enhancement of CP (which we call SimplE) to allow the two embeddings of each entity to be learned dependently. The complexity of SimplE grows linearly with the size of embeddings. The embeddings learned through SimplE are interpretable, and certain types of background knowledge can be incorporated into these embeddings through weight tying. We prove SimplE is fully expressive and derive a bound on the size of its embeddings for full expressivity. We show empirically that, despite its simplicity, SimplE outperforms several state-of-the-art tensor factorization techniques. SimplE's code is available on GitHub at https://github.com/Mehran-k/SimplE.
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments
Srinivasan, Sriram, Lanctot, Marc, Zambaldi, Vinicius, Perolat, Julien, Tuyls, Karl, Munos, Remi, Bowling, Michael
Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero-sum games, without any domain-specific state space reductions.
Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces
Ohnishi, Motoya, Yukawa, Masahiro, Johansson, Mikael, Sugiyama, Masashi
Motivated by the success of reinforcement learning (RL) for discrete-time tasks such as AlphaGo and Atari games, there has been a recent surge of interest in using RL for continuous-time control of physical systems (cf. many challenging tasks in OpenAI Gym and DeepMind Control Suite). Since discretization of time is susceptible to error, it is methodologically more desirable to handle the system dynamics directly in continuous time. However, very few techniques exist for continuous-time RL and they lack flexibility in value function approximation. In this paper, we propose a novel framework for model-based continuous-time value function approximation in reproducing kernel Hilbert spaces. The resulting framework is so flexible that it can accommodate any kind of kernel-based approach, such as Gaussian processes and kernel adaptive filters, and it allows us to handle uncertainties and nonstationarity without prior knowledge about the environment or what basis functions to employ. We demonstrate the validity of the presented framework through experiments.
Bayesian Semi-supervised Learning with Graph Gaussian Processes
Ng, Yin Cheng, Colombo, Nicolò, Silva, Ricardo
We propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. The proposed model shows extremely competitive performance when compared to the state-of-the-art graph neural networks on semi-supervised learning benchmark experiments, and outperforms the neural networks in active learning experiments where labels are scarce. Furthermore, the model does not require a validation data set for early stopping to control over-fitting. Our model can be viewed as an instance of empirical distribution regression weighted locally by network connectivity. We further motivate the intuitive construction of the model with a Bayesian linear model interpretation where the node features are filtered by an operator related to the graph Laplacian. The method can be easily implemented by adapting off-the-shelf scalable variational inference algorithms for Gaussian processes.
A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents
ZHENG, YAN, Meng, Zhaopeng, Hao, Jianye, Zhang, Zongzhang, Yang, Tianpei, Fan, Changjie
In multiagent domains, coping with non-stationary agents that change behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly. This paper studies efficient policy detecting and reusing techniques when playing against non-stationary agents in Markov games. We propose a new deep BPR+ algorithm by extending the recent BPR+ algorithm with a neural network as the value-function approximator. To detect policy accurately, we propose the \textit{rectified belief model} taking advantage of the \textit{opponent model} to infer the other agent's policy from reward signals and its behaviors. Instead of directly storing individual policies as BPR+, we introduce \textit{distilled policy network} that serves as the policy library in BPR+, using policy distillation to achieve efficient online policy learning and reuse. Deep BPR+ inherits all the advantages of BPR+ and empirically shows better performance in terms of detection accuracy, cumulative rewards and speed of convergence compared to existing algorithms in complex Markov games with raw visual inputs.
Snap ML: A Hierarchical Framework for Machine Learning
Dünner, Celestine, Parnell, Thomas, Sarigiannis, Dimitrios, Ioannou, Nikolas, Anghel, Andreea, Ravi, Gummadi, Kandasamy, Madhusudanan, Pozidis, Haralampos
We describe a new software framework for fast training of generalized linear models. The framework, named Snap Machine Learning (Snap ML), combines recent advances in machine learning systems and algorithms in a nested manner to reflect the hierarchical architecture of modern computing systems. We prove theoretically that such a hierarchical system can accelerate training in distributed environments where intra-node communication is cheaper than inter-node communication. Additionally, we provide a review of the implementation of Snap ML in terms of GPU acceleration, pipelining, communication patterns and software architecture, highlighting aspects that were critical for achieving high performance. We evaluate the performance of Snap ML in both single-node and multi-node environments, quantifying the benefit of the hierarchical scheme and the data streaming functionality, and comparing with other widely-used machine learning software frameworks. Finally, we present a logistic regression benchmark on the Criteo Terabyte Click Logs dataset and show that Snap ML achieves the same test loss an order of magnitude faster than any of the previously reported results, including those obtained using TensorFlow and scikit-learn.
Machine learning in resting-state fMRI analysis
Khosla, Meenakshi, Jamison, Keith, Ngo, Gia H., Kuceyeski, Amy, Sabuncu, Mert R.
Machine learning techniques have gained prominence for the analysis of resting-state functional Magnetic Resonance Imaging (rsfMRI) data. Here, we present an overview of various unsupervised and supervised machine learning applicationsto rsfMRI. We present a methodical taxonomy of machine learning methods in resting-state fMRI. We identify three major divisions of unsupervised learning methods with regard to their applications to rsfMRI, based on whether they discover principal modes of variation across space, time or population. Next, we survey the algorithms and rsfMRI feature representations that have driven the success of supervised subject-level predictions. Thegoal is to provide a high-level overview of the burgeoning field of rsfMRI from the perspective of machine learning applications. Keywords: Machine learning, resting-state, functional MRI, intrinsic networks, brain connectivity 1. Introduction Resting-state fMRI (rsfMRI) is a widely used neuroimaging tool that ...
Thanks to AI, the Year Ahead Will be Highly Visual - RTInsights
AI and machine learning will exploit new hardware and growing capabilities in visual data capture and analysis in the coming year. The year ahead will literally be marked with greater clarity than ever before, thanks to the accelerating convergence of hardware and systems that extend our visual capabilities with artificial intelligence (AI) and machine learning. Thus, one of the most compelling trends for 2019 will be the rise of visually oriented AI as the latest driving force behind the real-time enterprise. In 2018, we saw the introduction of products such as Netvue's Belle, an AI-powered, camera-enabled doorbell that can recognize the identities of guests, and greet them accordingly. Look for more innovative applications along these lines that extend to the business sector -- not only recognizing and greeting customers but also gauging their sentiment.