Undirected Networks
Markov Random Fields for Collaborative Filtering
Collaborative filtering has witnessed significant improvem ents in recent years, largely due to models based on low-dimensional embeddings, like weighted matrix factorizati on (e.g., [26, 39]) and deep learning [23, 22, 33, 47, 62, 58, 20, 11], including autoencoders [58, 33]. Also neighborhoo d-based approaches are competitive in certain regimes (e.g., [1, 53, 54]), despite being simple heuristics based o n item-item (or user-user) similarity matrices (like cosin e similarity). In this paper, we outline that Markov Random Fi elds (MRF) are closely related to autoencoders as well as to neighborhood-based approaches. W e build on the enormo us progress made in learning MRFs, in particular in sparse inverse covariance estimation (e.g., [36, 59, 15, 2, 60, 44, 45, 63, 55, 24, 25, 52, 56, 51]). Much of the literature on sparse inverse covariance estimation focuses on the regi me where the number of data points n is much smaller than the number of variables m in the model ( n m).
Collaborative Graph Walk for Semi-supervised Multi-Label Node Classification
Akujuobi, Uchenna, Yufei, Han, Zhang, Qiannan, Zhang, Xiangliang
Personal use of this material is permitted. Abstract --In this work, we study semi-supervised multi-label node classification problem in attributed graphs. Classic solutions to multi-label node classification follow two steps, first learn node embedding and then build a node classifier on the learned embedding. T o improve the discriminating power of the node embedding, we propose a novel collaborative graph walk, named Multi-Label-Graph-Walk, to finely tune node representations with the available label assignments in attributed graphs via reinforcement learning. The proposed method formulates the multi-label node classification task as simultaneous graph walks conducted by multiple label-specific agents. Furthermore, policies of the label-wise graph walks are learned in a cooperative way to capture first the predictive relation between node labels and structural attributes of graphs; and second, the correlation among the multiple label-specific classification tasks. A comprehensive experimental study demonstrates that the proposed method can achieve significantly better multi-label classification performance than the state-of-the-art approaches and conduct more efficient graph exploration. Index T erms --Multi-label node classification, Semi-supervised attributed graph embedding, Reinforcement learning I. I NTRODUCTION Graph-structured data are frequently witnessed in many real-world applications, such as social graphs and academic graphs. In the graph structure, nodes represent entities (e.g., users in social graphs and papers in citation graphs), whereas edges linking two nodes denote the relationship between the entities (e.g., user friendship and paper citation). Usually both nodes and edges possess their own attributes.
Collapsed Amortized Variational Inference for Switching Nonlinear Dynamical Systems
Dong, Zhe, Seybold, Bryan A., Murphy, Kevin P., Bui, Hung H.
We propose an efficient inference method for switching nonlinear dynamical systems. The key idea is to learn an inference network which can be used as a proposal distribution for the continuous latent variables, while performing exact marginalization of the discrete latent variables. This allows us to use the reparameterization trick, and apply end-to-end training with stochastic gradient descent. We show that the proposed method can successfully segment time series data (including videos) into meaningful "regimes", by using the piece-wise nonlinear dynamics.
Multi-Resolution Weak Supervision for Sequential Data
Sala, Frederic, Varma, Paroma, Fries, Jason, Fu, Daniel Y., Sagawa, Shiori, Khattar, Saelig, Ramamoorthy, Ashwini, Xiao, Ke, Fatahalian, Kayvon, Priest, James, Ré, Christopher
Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision is estimating the unknown accuracies and correlations of these sources without using labeled data. Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence. We propose Dugong, the first framework to model multi-resolution weak supervision sources with complex correlations to assign probabilistic labels to training data. Theoretically, we prove that Dugong, under mild conditions, can uniquely recover the unobserved accuracy and correlation parameters and use parameter sharing to improve sample complexity. Our method assigns clinician-validated labels to population-scale biomedical video repositories, helping outperform traditional supervision by 36.8 F1 points and addressing a key use case where machine learning has been severely limited by the lack of expert labeled data. On average, Dugong improves over traditional supervision by 16.0 F1 points and existing weak supervision approaches by 24.2 F1 points across several video and sensor classification tasks.
Aggregated Gradient Langevin Dynamics
Zhang, Chao, Xie, Jiahao, Shen, Zebang, Zhao, Peilin, Zhou, Tengfei, Qian, Hui
In this paper, we explore a general Aggregated Gradient Langevin Dynamics framework (AGLD) for the Markov Chain Monte Carlo (MCMC) sampling. We investigate the nonasymptotic convergence of AGLD with a unified analysis for different data accessing (e.g. random access, cyclic access and random reshuffle) and snapshot updating strategies, under convex and nonconvex settings respectively. It is the first time that bounds for I/O friendly strategies such as cyclic access and random reshuffle have been established in the MCMC literature. The theoretic results also indicate that methods in AGLD possess the merits of both the low per-iteration computational complexity and the short mixture time. Empirical studies demonstrate that our framework allows to derive novel schemes to generate high-quality samples for large-scale Bayesian posterior learning tasks.
Dealing with Sparse Rewards in Reinforcement Learning
Successfully navigating a complex environment to obtain a desired outcome is a difficult task, that up to recently was believed to be capable only by humans. This perception has been broken down over time, especially with the introduction of deep reinforcement learning, which has greatly increased the difficulty of tasks that can be automated. However, for traditional reinforcement learning agents this requires an environment to be able to provide frequent extrinsic rewards, which are not known or accessible for many real-world environments. This project aims to explore and contrast existing reinforcement learning solutions that circumnavigate the difficulties of an environment that provide sparse rewards. Different reinforcement solutions will be implemented over a several video game environments with varying difficulty and varying frequency of rewards, as to properly investigate the applicability of these solutions. This project introduces a novel reinforcement learning solution, by combining aspects of two existing state of the art sparse reward solutions.
Perception-Distortion Trade-off with Restricted Boltzmann Machines
Cannella, Chris, Ding, Jie, Soltani, Mohammadreza, Tarokh, Vahid
For example, we might expect to encounter sensor malfunctions in a wireless sensor network at a rate proportional to the size of the network. Therefore, there is a growing need to develop machine learning techniques that enable satisfactory training and inference from incomplete data. Imputation, where missing data values are filled with suitable values inferred from observations, represents a promising technique for extending machine learning methods to handle missing data. Given their explicit representation of underlying data distributions, Restricted Boltzmann Machines (RBMs) are an appealing choice for imputing missing values. With a well trained RBM, the conditional probabilities of the missing values given the observed values remain accessible via either direct calculation (in a theoretical sense) or indirect Gibbs sampling. A variety of training and imputing procedures have been proposed to allow the application of RBMs to handle missing data, with various computational costs.
RLScheduler: Learn to Schedule HPC Batch Jobs Using Deep Reinforcement Learning
Zhang, Di, Dai, Dong, He, Youbiao, Bao, Forrest Sheng
We present RLScheduler, a deep reinforcement learning based job scheduler for scheduling independent batch jobs in high-performance computing (HPC) environment. From knowing nothing about scheduling at beginning, RLScheduler is able to autonomously learn how to effectively schedule HPC batch jobs, targeting a given optimization goal. This is achieved by deep reinforcement learning with the help of specially designed neural network structures and various optimizations to stabilize and accelerate the learning. Our results show that RLScheduler can outperform existing heuristic scheduling algorithms, including a manually fine-tuned machine learning-based scheduler on the same workload. More importantly, we show that RLScheduler does not blindly over-fit the given workload to achieve such optimization, instead, it learns general rules for scheduling batch jobs which can be further applied to different workloads and systems to achieve similarly optimized performance. We also demonstrate that RLScheduler is capable of adjusting itself along with changing goals and workloads, making it an attractive solution for the future autonomous HPC management.
Neuro-SERKET: Development of Integrative Cognitive System through the Composition of Deep Probabilistic Generative Models
Taniguchi, Tadahiro, Nakamura, Tomoaki, Suzuki, Masahiro, Kuniyasu, Ryo, Hayashi, Kaede, Taniguchi, Akira, Horii, Takato, Nagai, Takayuki
This paper describes a framework for the development of an integrative cognitive system based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is an extension of SERKET, which can compose elemental PGMs developed in a distributed manner and provide a scheme that allows the composed PGMs to learn throughout the system in an unsupervised way. In addition to the head-to-tail connection supported by SERKET, Neuro-SERKET supports tail-to-tail and head-to-head connections, as well as neural network-based modules, i.e., deep generative models. As an example of a Neuro-SERKET application, an integrative model was developed by composing a variational autoencoder (VAE), a Gaussian mixture model (GMM), latent Dirichlet allocation (LDA), and automatic speech recognition (ASR). The model is called VAE+GMM+LDA+ASR. The performance of VAE+GMM+LDA+ASR and the validity of Neuro-SERKET were demonstrated through a multimodal categorization task using image data and a speech signal of numerical digits.
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
Zintgraf, Luisa, Shiarlis, Kyriacos, Igl, Maximilian, Schulze, Sebastian, Gal, Yarin, Hofmann, Katja, Whiteson, Shimon
V ARIBAD: A V ERY G OOD M ETHOD FOR B AYES-A DAPTIVE D EEP RL VIA M ETA-L EARNING Luisa Zintgraf University of Oxford Kyriacos Shiarlis Latent Logic Maximilian Igl University of Oxford Sebastian Schulze University of Oxford Y arin Gal OA TML Group, University of Oxford Katja Hofmann Microsoft Research Shimon Whiteson University of Oxford Latent Logic A BSTRACT Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent's uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We also evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher return during training than existing methods. 1 I NTRODUCTION Reinforcement learning (RL) is typically concerned with finding an optimal policy that maximises expected return for a given Markov decision process (MDP) with an unknown reward and transition function. If these were known, the optimal policy could in theory be computed without interacting with the environment. By contrast, learning in an unknown environment typically requires trading off exploration (learning about the environment) and exploitation (taking promising actions). Balancing this tradeoff is key to maximising expected return during learning . A Bayes-optimal policy, which does so optimally, conditions actions not only on the environment state but on the agent's own uncertainty about the current MDP . In principle, a Bayes-optimal policy can be computed using the framework of Bayes-adaptive Markov decision processes (BAMDPs) (Martin, 1967; Duff & Barto, 2002). The agent maintains a belief, i.e., a posterior distribution, over possible environments. Augmenting the state space of the underlying MDP with this posterior distribution yields a BAMDP, a special case of a belief MDP (Kaelbling et al., 1998).