Goto

Collaborating Authors

 Directed Networks


Collapsed variational Bayes for Markov jump processes

Neural Information Processing Systems

Markov jump processes are continuous-time stochastic processes widely used in statistical applications in the natural sciences, and more recently in machine learning. Inference for these models typically proceeds via Markov chain Monte Carlo, and can suffer from various computational challenges. In this work, we propose a novel collapsed variational inference algorithm to address this issue. Our work leverages ideas from discrete-time Markov chains, and exploits a connection between these two through an idea called uniformization. Our algorithm proceeds by marginalizing out the parameters of the Markov jump process, and then approximating the distribution over the trajectory with a factored distribution over segments of a piecewise-constant function. Unlike MCMC schemes that marginalize out transition times of a piecewise-constant process, our scheme optimizes the discretization of time, resulting in significant computational savings. We apply our ideas to synthetic data as well as a dataset of check-in recordings, where we demonstrate superior performance over state-of-the-art MCMC methods.


PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference

Neural Information Processing Systems

Generalized linear models (GLMs)---such as logistic regression, Poisson regression, and robust regression---provide interpretable models for diverse data types. Probabilistic approaches, particularly Bayesian ones, allow coherent estimates of uncertainty, incorporation of prior information, and sharing of power across experiments via hierarchical models. In practice, however, the approximate Bayesian methods necessary for inference have either failed to scale to large data sets or failed to provide theoretical guarantees on the quality of inference. We propose a new approach based on constructing polynomial approximate sufficient statistics for GLMs (PASS-GLM). We demonstrate that our method admits a simple algorithm as well as trivial streaming and distributed extensions that do not compound error across computations. We provide theoretical guarantees on the quality of point (MAP) estimates, the approximate posterior, and posterior mean and uncertainty estimates. We validate our approach empirically in the case of logistic regression using a quadratic approximation and show competitive performance with stochastic gradient descent, MCMC, and the Laplace approximation in terms of speed and multiple measures of accuracy---including on an advertising data set with 40 million data points and 20,000 covariates.


Model-based Bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit

Neural Information Processing Systems

Population activity measurement by calcium imaging can be combined with cellular resolution optogenetic activity perturbations to enable the mapping of neural connectivity in vivo. This requires accurate inference of perturbed and unperturbed neural activity from calcium imaging measurements, which are noisy and indirect, and can also be contaminated by photostimulation artifacts. We have developed a new fully Bayesian approach to jointly inferring spiking activity and neural connectivity from in vivo all-optical perturbation experiments. In contrast to standard approaches that perform spike inference and analysis in two separate maximum-likelihood phases, our joint model is able to propagate uncertainty in spike inference to the inference of connectivity and vice versa. We use the framework of variational autoencoders to model spiking activity using discrete latent variables, low-dimensional latent common input, and sparse spike-and-slab generalized linear coupling between neurons. Additionally, we model two properties of the optogenetic perturbation: off-target photostimulation and photostimulation transients. Using this model, we were able to fit models on 30 minutes of data in just 10 minutes. We performed an all-optical circuit mapping experiment in primary visual cortex of the awake mouse, and use our approach to predict neural connectivity between excitatory neurons in layer 2/3. Predicted connectivity is sparse and consistent with known correlations with stimulus tuning, spontaneous correlation and distance.


Tractability in Structured Probability Spaces

Neural Information Processing Systems

Recently, the Probabilistic Sentential Decision Diagram (PSDD) has been proposed as a framework for systematically inducing and learning distributions over structured objects, including combinatorial objects such as permutations and rankings, paths and matchings on a graph, etc. In this paper, we study the scalability of such models in the context of representing and learning distributions over routes on a map. In particular, we introduce the notion of a hierarchical route distribution and show how they can be leveraged to construct tractable PSDDs over route distributions, allowing them to scale to larger maps. We illustrate the utility of our model empirically, in a route prediction task, showing how accuracy can be increased significantly compared to Markov models.


Streaming Sparse Gaussian Process Approximations

Neural Information Processing Systems

Sparse pseudo-point approximations for Gaussian process (GP) models provide a suite of methods that support deployment of GPs in the large data regime and enable analytic intractabilities to be sidestepped. However, the field lacks a principled method to handle streaming data in which both the posterior distribution over function values and the hyperparameter estimates are updated in an online fashion. The small number of existing approaches either use suboptimal hand-crafted heuristics for hyperparameter learning, or suffer from catastrophic forgetting or slow updating when new data arrive. This paper develops a new principled framework for deploying Gaussian process probabilistic models in the streaming setting, providing methods for learning hyperparameters and optimising pseudo-input locations. The proposed framework is assessed using synthetic and real-world datasets.


Bayesian Compression for Deep Learning

Neural Information Processing Systems

Compression and computational efficiency in deep learning have become a problem of great significance. In this work, we argue that the most principled and effective way to attack this problem is by adopting a Bayesian point of view, where through sparsity inducing priors we prune large parts of the network. We introduce two novelties in this paper: 1) we use hierarchical priors to prune nodes instead of individual weights, and 2) we use the posterior uncertainties to determine the optimal fixed point precision to encode the weights. Both factors significantly contribute to achieving the state of the art in terms of compression rates, while still staying competitive with methods designed to optimize for speed or energy efficiency.


Tomography of the London Underground: a Scalable Model for Origin-Destination Data

Neural Information Processing Systems

The paper addresses the classical network tomography problem of inferring local traffic given origin-destination observations. Focusing on large complex public transportation systems, we build a scalable model that exploits input-output information toestimate the unobserved link/station loads and the users' path preferences. Based on the reconstruction of the users' travel time distribution, the model is flexible enough to capture possible different path-choice strategies and correlations between users travelling on similar paths at similar times. The corresponding likelihood function is intractable for medium or large-scale networks and we propose twodistinct strategies, namely the exact maximum-likelihood inference of an approximate but tractable model and the variational inference of the original intractable model. As an application of our approach, we consider the emblematic case of the London underground network, where a tap-in/tap-out system tracks the starting/exit time and location of all journeys in a day. A set of synthetic simulations and real data provided by Transport For London are used to validate and test the model on the predictions of observable and unobservable quantities.


Model-Powered Conditional Independence Test

Neural Information Processing Systems

We consider the problem of non-parametric Conditional Independence testing (CI testing) for continuous random variables. Given i.i.d samples from the joint distribution $f(x,y,z)$ of continuous random vectors $X,Y$ and $Z,$ we determine whether $X \independent Y \vert Z$. We approach this by converting the conditional independence test into a classification problem. This allows us to harness very powerful classifiers like gradient-boosted trees and deep neural networks. These models can handle complex probability distributions and allow us to perform significantly better compared to the prior state of the art, for high-dimensional CI testing. The main technical challenge in the classification problem is the need for samples from the conditional product distribution $f^{CI}(x,y,z) = f(x|z)f(y|z)f(z)$ -- the joint distribution if and only if $X \independent Y \vert Z.$ -- when given access only to i.i.d. samples from the true joint distribution $f(x,y,z)$. To tackle this problem we propose a novel nearest neighbor bootstrap procedure and theoretically show that our generated samples are indeed close to $f^{CI}$ in terms of total variational distance. We then develop theoretical results regarding the generalization bounds for classification for our problem, which translate into error bounds for CI testing. We provide a novel analysis of Rademacher type classification bounds in the presence of non-i.i.d \textit{near-independent} samples. We empirically validate the performance of our algorithm on simulated and real datasets and show performance gains over previous methods.


Cold-Start Reinforcement Learning with Softmax Policy Gradient

Neural Information Processing Systems

Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction. In this paper, we describe a reinforcement learning method based on a softmax value function that requires neither of these procedures. Our method combines the advantages of policy-gradient methods with the efficiency and simplicity of maximum-likelihood approaches. We apply this new cold-start reinforcement learning method in training sequence generation models for structured output prediction problems. Empirical evidence validates this method on automatic summarization and image captioning tasks.


Multi-way Interacting Regression via Factorization Machines

Neural Information Processing Systems

We propose a Bayesian regression method that accounts for multi-way interactions of arbitrary orders among the predictor variables. Our model makes use of a factorization mechanism for representing the regression coefficients of interactions among the predictors, while the interaction selection is guided by a prior distribution on random hypergraphs, a construction which generalizes the Finite Feature Model. We present a posterior inference algorithm based on Gibbs sampling, and establish posterior consistency of our regression model. Our method is evaluated with extensive experiments on simulated data and demonstrated to be able to identify meaningful interactions in applications in genetics and retail demand forecasting.