Bayesian Inference
Copula Bayesian Networks
We present the Copula Bayesian Network model for representing multivariate continuous distributions. Our approach builds on a novel copula-based parameterization of a conditional density that, joined with a graph that encodes independencies, offers great flexibility in modeling high-dimensional densities, while maintaining control over the form of the univariate marginals. We demonstrate the advantage of our framework for generalization over standard Bayesian networks as well as tree structured copula models for varied real-life domains that are of substantially higher dimension than those typically considered in the copula literature. Papers published at the Neural Information Processing Systems Conference.
A Bayesian Approach to Concept Drift
To cope with concept drift, we placed a probability distribution over the location of the most-recent drift point. We used Bayesian model comparison to update this distribution from the predictions of models trained on blocks of consecutive observations and pruned potential drift points with low probability. We compare our approach to a non-probabilistic method for drift and a probabilistic method for change-point detection. In our experiments, our approach generally yielded improved accuracy and/or speed over these other methods. Papers published at the Neural Information Processing Systems Conference.
Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression
Khan, Emtiyaz, Mohamed, Shakir, Murphy, Kevin P.
We present a new variational inference algorithm for Gaussian processes with non-conjugate likelihood functions. This includes binary and multi-class classification, as well as ordinal regression. Our method constructs a convex lower bound, which can be optimized by using an efficient fixed point update method. We then show empirically that our new approach is much faster than existing methods without any degradation in performance. Papers published at the Neural Information Processing Systems Conference.
Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints
Khan, Omar Z., Poupart, Pascal, Agosta, John-mark M.
In this paper, we derive a method to refine a Bayes network diagnostic model by exploiting constraints implied by expert decisions on test ordering. At each step, the expert executes an evidence gathering test, which suggests the test's relative diagnostic value. We demonstrate that consistency with an expert's test selection leads to non-convex constraints on the model parameters. We incorporate these constraints by augmenting the network with nodes that represent the constraint likelihoods. Gibbs sampling, stochastic hill climbing and greedy search algorithms are proposed to find a MAP estimate that takes into account test ordering constraints and any data available.
Optimization-Based MCMC Methods for Nonlinear Hierarchical Statistical Inverse Problems
Bardsley, Johnathan, Cui, Tiangang
In many hierarchical inverse problems, not only do we want to estimate high- or infinite-dimensional model parameters in the parameter-to-observable maps, but we also have to estimate hyperparameters that represent critical assumptions in the statistical and mathematical modeling processes. As a joint effect of high-dimensionality, nonlinear dependence, and non-concave structures in the joint posterior posterior distribution over model parameters and hyperparameters, solving inverse problems in the hierarchical Bayesian setting poses a significant computational challenge. In this work, we aim to develop scalable optimization-based Markov chain Monte Carlo (MCMC) methods for solving hierarchical Bayesian inverse problems with nonlinear parameter-to-observable maps and a broader class of hyperparameters. Our algorithmic development is based on the recently developed scalable randomize-then-optimize (RTO) method [4] for exploring the high- or infinite-dimensional model parameter space. By using RTO either as a proposal distribution in a Metropolis-within-Gibbs update or as a biasing distribution in the pseudo-marginal MCMC [2], we are able to design efficient sampling tools for hierarchical Bayesian inversion. In particular, the integration of RTO and the pseudo-marginal MCMC has sampling performance robust to model parameter dimensions. We also extend our methods to nonlinear inverse problems with Poisson-distributed measurements. Numerical examples in PDE-constrained inverse problems and positron emission tomography (PET) are used to demonstrate the performance of our methods.
Posterior Ratio Estimation for Latent Variables
Zhang, Yulong, Yi, Mingxuan, Liu, Song, Kolar, Mladen
Comparing the underlying distributions of two given datasets has been an important task in machine learning community and has a wide range of applications. For example, change detection algorithms Kawahara and Sugiyama ((2012)) compare datasets collected at different time points and report how the underlying distribution has shifted over time; Transfer learning algorithms Quionero-Candela et al. ((2009)) utilize the estimated differences between two datasets to efficiently share information between different tasks. Generative Adversarial Net (GAN) Goodfellow et al. ((2014)) learns an implicit generative model whose output minimizes the differences between an artificial dataset and a real dataset. Various computational methods have been proposed for comparing underlying distributions given two sets of observations. For example, Maximum Mean Discrepancy (MMD) Gretton et al. ((2012)) computes the distance between the kernel mean embeddings of two datasets in Reproducing Kernel Hilbert Space (RKHS).
Bayesian models for Large-scale Hierarchical Classification
Gopal, Siddharth, Yang, Yiming, Bai, Bing, Niculescu-mizil, Alexandru
A challenging problem in hierarchical classification is to leverage the hierarchical relations among classes for improving classification performance. An even greater challenge is to do so in a manner that is computationally feasible for the large scale problems usually encountered in practice. This paper proposes a set of Bayesian methods to model hierarchical dependencies among class labels using multivari- ate logistic regression. Specifically, the parent-child relationships are modeled by placing a hierarchical prior over the children nodes centered around the parame- ters of their parents; thereby encouraging classes nearby in the hierarchy to share similar model parameters. We present new, efficient variational algorithms for tractable posterior inference in these models, and provide a parallel implementa- tion that can comfortably handle large-scale problems with hundreds of thousands of dimensions and tens of thousands of classes.
Bayesian nonparametric models for bipartite graphs
We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, an Indian Buffet-like generative process for network growth, and a simple and efficient Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks.
Bayesian estimation of discrete entropy with mixtures of stick-breaking priors
Archer, Evan, Park, Il Memming, Pillow, Jonathan W.
We consider the problem of estimating Shannon's entropy H in the under-sampled regime, where the number of possible symbols may be unknown or countably infinite. Pitman-Yor processes (a generalization of Dirichlet processes) provide tractable prior distributions over the space of countably infinite discrete distributions, and have found major applications in Bayesian non-parametric statistics and machine learning. Here we show that they also provide natural priors for Bayesian entropy estimation, due to the remarkable fact that the moments of the induced posterior distribution over H can be computed analytically. We derive formulas for the posterior mean (Bayes' least squares estimate) and variance under such priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior on H, meaning the prior strongly determines the entropy estimate in the under-sampled regime.
Bayesian Bias Mitigation for Crowdsourcing
Wauthier, Fabian L., Jordan, Michael I.
Biased labelers are a systemic problem in crowdsourcing, and a comprehensive toolbox for handling their responses is still being developed. A typical crowdsourcing application can be divided into three steps: data collection, data curation, and learning. At present these steps are often treated separately. We present Bayesian Bias Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all three. Most data curation methods account for the {\it effects} of labeler bias by modeling all labels as coming from a single latent truth.