Goto

Collaborating Authors

 Opper, Manfred


A Tractable Approximation to Optimal Point Process Filtering: Application to Neural Encoding

Neural Information Processing Systems

The process of dynamic state estimation (filtering) based on point process observations is in general intractable. Numerical sampling techniques are often practically useful, but lead to limited conceptual insight about optimal encoding/decoding strategies, which are of significant relevance to Computational Neuroscience. We develop an analytically tractable Bayesian approximation to optimal filtering based on point process observations, which allows us to introduce distributional assumptions about sensory cell properties, that greatly facilitates the analysis of optimal encoding in situations deviating from common assumptions of uniform coding. The analytic framework leads to insights which are difficult to obtain from numerical algorithms, and is consistent with experiments about the distribution of tuning curve centers. Interestingly, we find that the information gained from the absence of spikes may be crucial to performance.


Tightening Bounds for Variational Inference by Revisiting Perturbation Theory

arXiv.org Machine Learning

Variational inference has become one of the most widely used methods in latent variable modeling. In its basic form, variational inference employs a fully factorized variational distribution and minimizes its KL divergence to the posterior. As the minimization can only be carried out approximately, this approximation induces a bias. In this paper, we revisit perturbation theory as a powerful way of improving the variational approximation. Perturbation theory relies on a form of Taylor expansion of the log marginal likelihood, vaguely in terms of the log ratio of the true posterior and its variational approximation. While first order terms give the classical variational bound, higher-order terms yield corrections that tighten it. However, traditional perturbation theory does not provide a lower bound, making it inapt for stochastic optimization. In this paper, we present a similar yet alternative way of deriving corrections to the ELBO that resemble perturbation theory, but that result in a valid bound. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.


Multi-Class Gaussian Process Classification Made Conjugate: Efficient Inference via Data Augmentation

arXiv.org Machine Learning

We propose a new scalable multi-class Gaussian process classification approach building on a novel modified softmax likelihood function. The new likelihood has two benefits: it leads to well-calibrated uncertainty estimates and allows for an efficient latent variable augmentation. The augmented model has the advantage that it is conditionally conjugate leading to a fast variational inference method via block coordinate ascent updates. Previous approaches suffered from a trade-off between uncertainty calibration and speed. Our experiments show that our method leads to well-calibrated uncertainty estimates and competitive predictive performance while being up to two orders faster than the state of the art.


Efficient Bayesian Inference of Sigmoidal Gaussian Cox Processes

arXiv.org Machine Learning

We present an approximate Bayesian inference approach for estimating the intensity of a inhomogeneous Poisson process, where the intensity function is modelled using a Gaussian process (GP) prior via a sigmoid link function. Augmenting the model using a latent marked Poisson process and P\'olya--Gamma random variables we obtain a representation of the likelihood which is conjugate to the GP prior. We approximate the posterior using a free--form mean field approximation together with the framework of sparse GPs. Furthermore, as alternative approximation we suggest a sparse Laplace approximation of the posterior, for which an efficient expectation--maximisation algorithm is derived to find the posterior's mode. Results of both algorithms compare well with exact inference obtained by a Markov Chain Monte Carlo sampler and standard variational Gauss approach, while being one order of magnitude faster.


Efficient Bayesian Inference for a Gaussian Process Density Model

arXiv.org Machine Learning

We reconsider a nonparametric density model based on Gaussian processes. By augmenting the model with latent P\'olya--Gamma random variables and a latent marked Poisson process we obtain a new likelihood which is conjugate to the model's Gaussian process prior. The augmented posterior allows for efficient inference by Gibbs sampling and an approximate variational mean field approach. For the latter we utilise sparse GP approximations to tackle the infinite dimensionality of the problem. The performance of both algorithms and comparisons with other density estimators are demonstrated on artificial and real datasets with up to several thousand data points.


Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation

arXiv.org Machine Learning

We propose an efficient stochastic variational approach to GP classification building on Polya- Gamma data augmentation and inducing points, which is based on closed-form updates of natural gradients. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to three orders of magnitude faster than the state-of-the-art while being competitive in terms of prediction performance.


Perturbative Black Box Variational Inference

arXiv.org Machine Learning

Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the Kullback-Leibler (KL) divergence, such as alpha divergences. In this paper, we view BBVI with generalized divergences as a form of estimating the marginal likelihood via biased importance sampling. The choice of divergence determines a bias-variance trade-off between the tightness of a bound on the marginal likelihood (low bias) and the variance of its gradient estimators. Drawing on variational perturbation theory of statistical physics, we use these insights to construct a family of new variational bounds. Enumerated by an odd integer order $K$, this family captures the standard KL bound for $K=1$, and converges to the exact marginal likelihood as $K\to\infty$. Compared to alpha-divergences, our reparameterization gradients have a lower variance. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.


Perturbative Black Box Variational Inference

Neural Information Processing Systems

Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the Kullback-Leibler (KL) divergence, such as alpha divergences. In this paper, we view BBVI with generalized divergences as a form of estimating the marginal likelihood via biased importance sampling. The choice of divergence determines a bias-variance trade-off between the tightness of a bound on the marginal likelihood (low bias) and the variance of its gradient estimators. Drawing on variational perturbation theory of statistical physics, we use these insights to construct a family of new variational bounds. Enumerated by an odd integer order $K$, this family captures the standard KL bound for $K=1$, and converges to the exact marginal likelihood as $K\to\infty$. Compared to alpha-divergences, our reparameterization gradients have a lower variance. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data.


Inverse Ising problem in continuous time: A latent variable approach

arXiv.org Machine Learning

We consider the inverse Ising problem, i.e. the inference of network couplings from observed spin trajectories for a model with continuous time Glauber dynamics. By introducing two sets of auxiliary latent random variables we render the likelihood into a form, which allows for simple iterative inference algorithms with analytical updates. The variables are: (1) Poisson variables to linearise an exponential term which is typical for point process likelihoods and (2) P\'olya-Gamma variables, which make the likelihood quadratic in the coupling parameters. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum likelihood estimate of network parameters. Using a third set of latent variables we extend the EM algorithm to sparse couplings via L1 regularization. Finally, we develop an efficient approximate Bayesian inference algorithm using a variational approach. We demonstrate the performance of our algorithms on data simulated from an Ising model. For data which are simulated from a more biologically plausible network with spiking neurons, we show that the Ising model captures well the low order statistics of the data and how the Ising couplings are related to the underlying synaptic structure of the simulated network.


A statistical physics approach to learning curves for the Inverse Ising problem

arXiv.org Machine Learning

Using methods of statistical physics, we analyse the error of learning couplings in large Ising models from independent data (the inverse Ising problem). We concentrate on learning based on local cost functions, such as the pseudo-likelihood method for which the couplings are inferred independently for each spin. Assuming that the data are generated from a true Ising model, we compute the reconstruction error of the couplings using a combination of the replica method with the cavity approach for densely connected systems. We show that an explicit estimator based on a quadratic cost function achieves minimal reconstruction error, but requires the length of the true coupling vector as prior knowledge. A simple mean field estimator of the couplings which does not need such knowledge is asymptotically optimal, i.e. when the number of observations is much large than the number of spins. Comparison of the theory with numerical simulations shows excellent agreement for data generated from two models with random couplings in the high temperature region: a model with independent couplings (Sherrington-Kirkpatrick model), and a model where the matrix of couplings has a Wishart distribution.