Goto

Collaborating Authors

 Undirected Networks


Variational Measure Preserving Flows

arXiv.org Machine Learning

Probabilistic modelling is a general and elegant framework to capture the uncertainty, ambiguity and diversity of hidden structures in data. Probabilistic inference is the key operation on probabilistic models to obtain the distribution over the latent representations given data. Unfortunately, the computation of inference on complex models is extremely challenging. In spite of the success of existing inference methods, like Markov chain Monte Carlo(MCMC) and variational inference(VI), many powerful models are not available for large scale problems because inference is simply computationally intractable. The recent advances in using neural networks for probabilistic inference have shown promising results on this challenge. In this work, we propose a novel general inference framework that has the strength from both MCMC and VI. The proposed method is not only computationally scalable and efficient, but also has its root from the ergodicity theorem, that provides the guarantee of better performance with more computational power. Our experiment results suggest that our method can outperform state-of-the-art methods on generative models and Bayesian neural networks on some popular benchmark problems.


Learning Restricted Boltzmann Machines via Influence Maximization

arXiv.org Machine Learning

Graphical models are a rich language for describing high-dimensional distributions in terms of their dependence structure. While there are provable algorithms for learning graphical models in a variety of settings, there has been much less progress when there are latent variables. Here we study Restricted Boltzmann Machines (or RBMs), which are a popular model with wide-ranging applications in dimensionality reduction, collaborative filtering, topic modeling, feature extraction and deep learning. We give a simple greedy algorithm based on influence maximization to learn ferromagnetic RBMs with bounded degree. More precisely, we learn a description of the distribution on the observed variables as a Markov Random Field (or MRF), even though it exhibits complex higher- order interactions. Our analysis is based on tools from mathematical physics that were developed to show the concavity of magnetization. Moreover our results extend in a straightforward manner to ferromagnetic Ising models with latent variables. Conversely, we show that the distribution on the observed nodes of a general RBM can simulate any MRF which allows us to show new hardness results for improperly learning RBMs even with only a constant number of latent variables.


A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

arXiv.org Machine Learning

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis.


Breathing-Based Authentication on Resource-Constrained IoT Devices using Recurrent Neural Networks

IEEE Computer

Recurrent neural networks (RNNs) have shown promising results in audio and speech-processing applications. The increasing popularity of Internet of Things (IoT) devices makes a strong case for implementing RNN-based inferences for applications such as acoustics-based authentication and voice commands for smart homes. However, the feasibility and performance of these inferences on resource-constrained devices remain largely unexplored. The authors compare traditional machine-learning models with deep-learning RNN models for an end-to-end authentication system based on breathing acoustics.


Inverse POMDP: Inferring What You Think from What You Do

arXiv.org Machine Learning

Complex behaviors are often driven by an internal model, which integrates sensory information over time and facilitates long-term planning. Inferring the internal model is a crucial ingredient for interpreting neural activities of agents and is beneficial for imitation learning. Here we describe a method to infer an agent's internal model and dynamic beliefs, and apply it to a simulated agent performing a foraging task. We assume the agent behaves rationally according to their understanding of the task and the relevant causal variables that cannot be fully observed. We model this rational solution as a Partially Observable Markov Decision Process (POMDP). However, we allow that the agent may have wrong assumptions about the task, and our method learns these assumptions from the agent's actions.Given the agent's sensory observations and actions, we learn its internal model by maximum likelihood estimation over a set of task-relevant parameters. The Markov property of the POMDP enables us to characterize the transition probabilities between internal states and iteratively estimate the agent's policy using a constrained Expectation-Maximization algorithm. We validate our method on simulated agents performing suboptimally on a foraging task, and successfully recover the agent's actual model.


Geographical Hidden Markov Tree for Flood Extent Mapping (With Proof Appendix)

arXiv.org Machine Learning

Flood extent mapping plays a crucial role in addressing grand societal challenges such as disaster management, national water forecasting, as well as energy and food security. For example, during Hurricane Harvey floods in 2017, first responders needed to know where flood water was in order to plan rescue efforts. In national water forecasting, detailed flood extent maps can be used to calibrate and validate the NOAA National Water Model [15], which can forecast the flow of over 2.7 million rivers and streams through the entire continental U.S. [4]. In current practice, flood extent maps are mostly generated by flood forecasting models, whose accuracy is often unsatisfactory in high spatial details [4]. Other ways to generate flood maps involve sending field crew on the ground to record highwater marks, or visually interpreting earth observation imagery [2]. However, the process is both expensive and time consuming. With the large amount of high-resolution earth imagery being collected from satellites (e.g.,


Variational Inference for Data-Efficient Model Learning in POMDPs

arXiv.org Machine Learning

Partially observable Markov decision processes (POMDPs) are a powerful abstraction for tasks that require decision making under uncertainty, and capture a wide range of real world tasks. Today, effective planning approaches exist that generate effective strategies given black-box models of a POMDP task. Yet, an open question is how to acquire accurate models for complex domains. In this paper we propose DELIP, an approach to model learning for POMDPs that utilizes amortized structured variational inference. We empirically show that our model leads to effective control strategies when coupled with state-of-the-art planners. Intuitively, model-based approaches should be particularly beneficial in environments with changing reward structures, or where rewards are initially unknown. Our experiments confirm that DELIP is particularly effective in this setting.


Markov Chain Importance Sampling - a highly efficient estimator for MCMC

arXiv.org Machine Learning

Markov chain algorithms are ubiquitous in machine learning and statistics and many other disciplines. In this work we present a novel estimator applicable to several classes of Markov chains, dubbed Markov chain importance sampling (MCIS). For a broad class of Metropolis-Hastings algorithms, MCIS efficiently makes use of rejected proposals. For discretized Langevin diffusions, it provides a novel way of correcting the discretization error. Our estimator satisfies a central limit theorem and improves on error per CPU cycle, often to a large extent. As a by-product it enables estimating the normalizing constant, an important quantity in Bayesian machine learning and statistics.


Reinforcement Learning for Heterogeneous Teams with PALO Bounds

arXiv.org Artificial Intelligence

We introduce reinforcement learning for heterogeneous teams in which rewards for an agent are additively factored into local costs, stimuli unique to each agent, and global rewards, those shared by all agents in the domain. Motivating domains include coordination of varied robotic platforms, which incur different costs for the same action, but share an overall goal. We present two templates for learning in this setting with factored rewards: a generalization of Perkins' Monte Carlo exploring starts for POMDPs to canonical MPOMDPs, with a single policy mapping joint observations of all agents to joint actions (MCES-MP); and another with each agent individually mapping joint observations to their own action (MCES-FMP). We use probably approximately local optimal (PALO) bounds to analyze sample complexity, instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique, and evaluate the approaches on three domains of heterogeneous agents demonstrating that MCES-FMP yields improved policies in less samples compared to MCES-MP and a previous benchmark.


Crossmodal Attentive Skill Learner

arXiv.org Artificial Intelligence

This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated with the recently-introduced Asynchronous Advantage Option-Critic (A2OC) architecture [Harb et al., 2017] to enable hierarchical reinforcement learning across multiple sensory inputs. We provide concrete examples where the approach not only improves performance in a single task, but accelerates transfer to new tasks. We demonstrate the attention mechanism anticipates and identifies useful latent features, while filtering irrelevant sensor modalities during execution. We modify the Arcade Learning Environment [Bellemare et al., 2013] to support audio queries, and conduct evaluations of crossmodal learning in the Atari 2600 game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017], we open-source a fast hybrid CPU-GPU implementation of CASL.