Bayesian Inference
Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with \beta -Divergences
We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with \beta -divergences. The resulting inference procedure is doubly robust for both the predictive and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as \beta \to 0 . Secondly, we give a principled way of choosing the divergence parameter \beta by minimizing expected predictive loss on-line.
Large-Scale Stochastic Sampling from the Probability Simplex
Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space the time-discretization error can dominate when we are near the boundary of the space. We demonstrate that because of this, current SGMCMC methods for the simplex struggle with sparse simplex spaces; when many of the components are close to zero. Unfortunately, many popular large-scale Bayesian models, such as network or topic models, require inference on sparse simplex spaces.
Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra
The most widely used technology to identify the proteins present in a complex biological sample is tandem mass spectrometry, which quickly produces a large collection of spectra representative of the peptides (i.e., protein subsequences) present in the original sample. In this work, we greatly expand the parameter learning capabilities of a dynamic Bayesian network (DBN) peptide-scoring algorithm, Didea, by deriving emission distributions for which its conditional log-likelihood scoring function remains concave. We show that this class of emission distributions, called Convex Virtual Emissions (CVEs), naturally generalizes the log-sum-exp function while rendering both maximum likelihood estimation and conditional maximum likelihood estimation concave for a wide range of Bayesian networks. Utilizing CVEs in Didea allows efficient learning of a large number of parameters while ensuring global convergence, in stark contrast to Didea's previous parameter learning framework (which could only learn a single parameter using a costly grid search) and other trainable models (which only ensure convergence to local optima). The newly trained scoring function substantially outperforms the state-of-the-art in both scoring function accuracy and downstream Fisher kernel analysis.
Scaling the Poisson GLM to massive neural datasets through polynomial approximations
Recent advances in recording technologies have allowed neuroscientists to record simultaneous spiking activity from hundreds to thousands of neurons in multiple brain regions. Such large-scale recordings pose a major challenge to existing statistical methods for neural data analysis. Here we develop highly scalable approximate inference methods for Poisson generalized linear models (GLMs) that require only a single pass over the data. Our approach relies on a recently proposed method for obtaining approximate sufficient statistics for GLMs using polynomial approximations [Huggins et al., 2017], which we adapt to the Poisson GLM setting. We focus on inference using quadratic approximations to nonlinear terms in the Poisson GLM log-likelihood with Gaussian priors, for which we derive closed-form solutions to the approximate maximum likelihood and MAP estimates, posterior distribution, and marginal likelihood.
Bayesian Inference of Temporal Task Specifications from Demonstrations
When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task. Prior research into learning from demonstrations (LfD) has failed to capture this notion of the acceptability of an execution; meanwhile, temporal logics provide a flexible language for expressing task specifications. Inspired by this, we present Bayesian specification inference, a probabilistic model for inferring task specification as a temporal logic formula. We incorporate methods from probabilistic programming to define our priors, along with a domain-independent likelihood function to enable sampling-based inference. We demonstrate the efficacy of our model for inferring true specifications with over 90% similarity between the inferred specification and the ground truth, both within a synthetic domain and a real-world table setting task.
Reviews: Model-based Bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit
This papers proposes an inference method of (biological) neural connectivity from fluorescence (calcium) traces. The model includes the spiking model (GLM low-rank factor) with an external input (optical stimulation) and a fluorescence model. The inference methods is based on variational Bayes, where the approximate posterior is modeled using a neural network. Novelty and originality: The methods in this paper are adequately novel and original, nicely combining various elements from previous work. Technical issues: My main problem with this paper is that I can't really be sure that the proposed method is actually working well. It is very good that the authors tested their method on real data, but since there is no ground truth, I it is hard to estimate the quality of the inferred weights (see footnote (1) below).
Reviews: Differentially private Bayesian learning on distributed data
Title: Differentially private Bayesian learning on distributed data Comments: - This paper develops a method for differential privacy (DP) Bayesian learning in a distributed setting, where data is split up over multiple clients. This differs from the traditional DP Bayesian learning setting, in which a single party has access to the full dataset. The main issue here is that performing DP methods separately on each client would yield too much noise; the goal is then to find a way to add an appropriate amount of noise, without compromising privacy, in this setting. To solve this, the authors introduce a method that combines existing DP Bayesian learning methods with a secure multi-party communication method called the DCA algorithm. Theoretically, this paper shows that the method satisfies differential privacy.
Reviews: Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning
This paper presents a sampling method that combines Hamiltonian Monte Carlo (HMC), mini-batches, tempering, and thermostats, to more efficiently explore multimodal target distributions. It is demonstrated on a number of substantial neural network problems using real data sets. This is an interesting method, and the empirical results are quite substantial. Figure 2 does a nice job of demonstrating how the omission of any of the ingredients (e.g. the tempering, or the thermostat) is detrimental to the overall result, which is a nice illustration of how the combination works together well. This is followed by some substantial image classification examples.
Reviews: Inference Aided Reinforcement Learning for Incentive Mechanism Design in Crowdsourcing
Summary: In this paper, the authors explore the problem of data collecting using crowdsourcing. In the setting of the paper, each task is a labeling task with binary labels, and workers are strategic in choosing effort levels and reporting strategies that maximize their utility. The true label for each task and workers' parameters are all unknown to the requester. The requester's goal is to learn how to decide the payment and how to aggregate the collected labels by learning from workers' past answers. The authors' proposed approach is a combination of incentive design, Bayesian inference, and reinforcement learning.
Reviews: Bayesian Model-Agnostic Meta-Learning
Summary: Meta-learning is motivated by the promise of being able to transfer knowledge from previous learning experiences to new task settings, such that a new task can be learned more effectively from few observations. Yet, updating highly parametric models with little amounts of data can easily lead to overfitting. A promising avenue towards overcoming this challenge is a Bayesian treatment of meta-learning. This work, builds on top of recent work that provides a Bayesian interpretation of MAML (model-agnostic-meta-learning). This contribution is a direct extension of (Grant et al 2018) - where the task-train posterior was approximated via a Gaussian distribution. Applying SVGD instead allows for a more flexible and (potentially) more accurate approximation of a highly complex posterior.