Goto

Collaborating Authors

 Bayesian Inference


ADMM-based Networked Stochastic Variational Inference

arXiv.org Machine Learning

Owing to the recent advances in "Big Data" modeling and prediction tasks, variational Bayesian estimation has gained popularity due to their ability to provide exact solutions to approximate posteriors. One key technique for approximate inference is stochastic variational inference (SVI). SVI poses variational inference as a stochastic optimization problem and solves it iteratively using noisy gradient estimates. It aims to handle massive data for predictive and classification tasks by applying complex Bayesian models that have observed as well as latent variables. This paper aims to decentralize it allowing parallel computation, secure learning and robustness benefits. We use Alternating Direction Method of Multipliers in a top-down setting to develop a distributed SVI algorithm such that independent learners running inference algorithms only require sharing the estimated model parameters instead of their private datasets. Our work extends the distributed SVI-ADMM algorithm that we first propose, to an ADMM-based networked SVI algorithm in which not only are the learners working distributively but they share information according to rules of a graph by which they form a network. This kind of work lies under the umbrella of `deep learning over networks' and we verify our algorithm for a topic-modeling problem for corpus of Wikipedia articles. We illustrate the results on latent Dirichlet allocation (LDA) topic model in large document classification, compare performance with the centralized algorithm, and use numerical experiments to corroborate the analytical results.


High-dimensional ABC

arXiv.org Machine Learning

This Chapter, "High-dimensional ABC", is to appear in the forthcoming Handbook of Approximate Bayesian Computation (2018). It details the main ideas and concepts behind extending ABC methods to higher dimensions, with supporting examples and illustrations.


Overview of Approximate Bayesian Computation

arXiv.org Machine Learning

This Chapter, "Overview of Approximate Bayesian Computation", is to appear as the first chapter in the forthcoming Handbook of Approximate Bayesian Computation (2018). It details the main ideas and concepts behind ABC methods with many examples and illustrations.


Pomegranate: fast and flexible probabilistic modeling in python

arXiv.org Machine Learning

We present pomegranate, an open source machine learning package for probabilistic modeling in Python. Probabilistic modeling encompasses a wide range of methods that explicitly describe uncertainty using probability distributions. Three widely used probabilistic models implemented in pomegranate are general mixture models, hidden Markov models, and Bayesian networks. A primary focus of pomegranate is to abstract away the complexities of training models from their definition. This allows users to focus on specifying the correct model for their application instead of being limited by their understanding of the underlying algorithms. An aspect of this focus involves the collection of additive sufficient statistics from data sets as a strategy for training models. This approach trivially enables many useful learning strategies, such as out-of-core learning, minibatch learning, and semi-supervised learning, without requiring the user to consider how to partition data or modify the algorithms to handle these tasks themselves. pomegranate is written in Cython to speed up calculations and releases the global interpreter lock to allow for built-in multithreaded parallelism, making it competitive with---or outperform---other implementations of similar algorithms. This paper presents an overview of the design choices in pomegranate, and how they have enabled complex features to be supported by simple code.


Noisy Natural Gradient as Variational Inference

arXiv.org Machine Learning

Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation. Unfortunately, there is a tradeoff between cheap but simple variational families (e.g.~fully factorized) or expensive and complicated inference procedures. We show that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO). This insight allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern-size ConvNets. On standard regression benchmarks, our noisy K-FAC algorithm makes better predictions and matches Hamiltonian Monte Carlo's predictive variances better than existing methods. Its improved uncertainty estimates lead to more efficient exploration in active learning, and intrinsic motivation for reinforcement learning.


ABC Samplers

arXiv.org Machine Learning

This Chapter, "ABC Samplers", is to appear in the forthcoming Handbook of Approximate Bayesian Computation (2018). It details the main ideas and algorithms used to sample from the ABC approximation to the posterior distribution, including methods based on rejection/importance sampling, MCMC and sequential Monte Carlo.


Bayesian shape modelling of cross-sectional geological data

arXiv.org Machine Learning

In particular, their cross-sectional shapes help determine their oil-bearing capacity. Current classification schemes for sand body shapes are qualitative, simple, and ad hoc, and so there is a need for a quantitative analysis with the help of statistical models. There are several problems of interest: estimation of shape class parameters given labelled data shapes (a'data shape' is an ordered set of points in R 2); classification of new data shapes; and unsupervised classification. Parameter estimation is described by the probability P(w y,c), where w denotes the shape class parameters andy the dataset, which consists of several data shapes, together with their class labelsc. By Bayes' theorem, this is given by: P(w y,c) P(y w,c) P(w).


Adversarial Training for Probabilistic Spiking Neural Networks

arXiv.org Machine Learning

Abstract--Classifiers trained using conventional empirical risk minimization or maximum likelihood methods are known to suffer dramatic performance degradations when tested over examples adversarially selected based on knowledge of the classifier's decision rule. Due to the prominence of Artificial Neural Networks (ANNs) as classifiers, their sensitivity to adversarial examples, as well as robust training schemes, have been recently the subject of intense investigation. In this paper, for the first time, the sensitivity of spiking neural networks (SNNs), or third-generation neural networks, to adversarial examples is studied. The study considers rate and time encoding, as well as rate and first-to-spike decoding. Furthermore, a robust training mechanism is proposed that is demonstrated to enhance the performance of SNNs under white-box attacks.


Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

arXiv.org Machine Learning

Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains largely unsolved. Thompson Sampling and its extension to reinforcement learning provide an elegant approach to exploration that only requires access to posterior samples of the model. At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical. Thus, it is attractive to consider approximate Bayesian neural networks in a Thompson Sampling framework. To understand the impact of using an approximate posterior on Thompson Sampling, we benchmark well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems. We found that many approaches that have been successful in the supervised learning setting underperformed in the sequential decision-making scenario. In particular, we highlight the challenge of adapting slowly converging uncertainty estimates to the online setting.


Conditionally Independent Multiresolution Gaussian Processes

arXiv.org Machine Learning

We propose a multiresolution Gaussian process (GP) model which assumes conditional independence among GPs across resolutions. We characterize each GP using a particular representation of the Karhunen-Lo\`eve expansion where each basis vector of the representation consists of an axis and a scale factor, referred to as the basis axis and the basis-axis scale. The basis axes have unique characteristics: They are zero-mean by construction and are on the unit sphere. The axes are modeled using Bingham distributions---a natural choice for modeling axial data. Given the axes, all GPs across resolutions are independent---this is in direct contrast to the common assumption of full independence between GPs. More specifically, all GPs are tied to the same set of axes but the basis-axis scales of each GP are specific to the resolution on which they are defined. Relaxing the full independence assumption helps in reducing overfitting which can be of a problem in an otherwise identical model architecture with full independence assumption. We consider a Bayesian treatment of the model using variational inference.