Bayesian Learning
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation
Teh, Yee W., Newman, David, Welling, Max
Latent Dirichlet allocation (LDA) is a Bayesian network that has recently gained much popularity in applications ranging from document modeling to computer vision. Due to the large scale nature of these applications, current inference procedures like variational Bayes and Gibbs sampling have been found lacking. In this paper we propose the collapsed variational Bayesian inference algorithm for LDA, and show that it is computationally efficient, easy to implement and significantly more accurate than standard variational Bayesian inference for LDA.
Mixture Regression for Covariate Shift
Sugiyama, Masashi, Storkey, Amos J.
In supervised learning there is a typical presumption that the training and test points are taken from the same distribution. In practice this assumption is commonly violated.The situations where the training and test data are from different distributions is called covariate shift. Recent work has examined techniques for dealing with covariate shift in terms of minimisation of generalisation error. As yet the literature lacks a Bayesian generative perspective on this problem. This paper tackles this issue for regression models. Recent work on covariate shift can be understood in terms of mixture regression. Using this view, we obtain a general approach to regression under covariate shift, which reproduces previous work as a special case. The main advantages of this new formulation over previous models forcovariate shift are that we no longer need to presume the test and training densities are known, the regression and density estimation are combined into a single procedure, and previous methods are reproduced as special cases of this procedure, shedding light on the implicit assumptions the methods are making.
Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space
We present a new statistical framework called hidden Markov Dirichlet process (HMDP) to jointly model the genetic recombinations among possibly infinite number of founders and the coalescence-with-mutation events in the resulting genealogies. TheHMDP posits that a haplotype of genetic markers is generated by a sequence of recombination events that select an ancestor for each locus from an unbounded set of founders according to a 1st-order Markov transition process. Conjoining this process with a mutation model, our method accommodates both between-lineage recombination and within-lineage sequence variations, and leads to a compact and natural interpretation of the population structure and inheritance process underlying haplotype data. We have developed an efficient sampling algorithm forHMDP based on a two-level nested Pólya urn scheme. On both simulated and real SNP haplotype data, our method performs competitively or significantly better than extant methods in uncovering the recombination hotspots along chromosomal loci;and in addition it also infers the ancestral genetic patterns and offers a highly accurate map of ancestral compositions of modern populations.
Theory and Dynamics of Perceptual Bistability
Schrater, Paul R., Sundareswara, Rashmi
Perceptual Bistability refers to the phenomenon of spontaneously switching between twoor more interpretations of an image under continuous viewing. Although switchingbehavior is increasingly well characterized, the origins remain elusive. We propose that perceptual switching naturally arises from the brain's search for best interpretations while performing Bayesian inference. In particular, we propose that the brain explores a posterior distribution over image interpretations ata rapid time scale via a sampling-like process and updates its interpretation when a sampled interpretation is better than the discounted value of its current interpretation. Weformalize the theory, explicitly derive switching rate distributions and discuss qualitative properties of the theory including the effect of changes in the posterior distribution on switching rates. Finally, predictions of the theory are shown to be consistent with measured changes in human switching dynamics to Necker cube stimuli induced by context.
Parameter Expanded Variational Bayesian Methods
Bayesian inference has become increasingly important in statistical machine learning. Exact Bayesian calculations are often not feasible in practice, however. A number of approximate Bayesian methods have been proposed to make such calculations practical, among them the variational Bayesian (VB) approach. The VB approach, while useful, can nevertheless suffer from slow convergence to the approximate solution. To address this problem, we propose Parameter-eXpanded Variational Bayesian (PX-VB) methods to speed up VB. The new algorithm is inspired byparameter-expanded expectation maximization (PX-EM) and parameterexpanded dataaugmentation (PX-DA). Similar to PX-EM and -DA, PX-VB expands a model with auxiliary variables to reduce the coupling between variables in the original model. We analyze the convergence rates of VB and PX-VB and demonstrate the superior convergence rates of PX-VB in variational probit regression andautomatic relevance determination.
Bayesian Model Scoring in Markov Random Fields
Scoring structures of undirected graphical models by means of evaluating the marginal likelihood is very hard. The main reason is the presence of the partition functionwhich is intractable to evaluate, let alone integrate over. We propose to approximate the marginal likelihood by employing two levels of approximation: we assume normality of the posterior (the Laplace approximation) and approximate allremaining intractable quantities using belief propagation and the linear response approximation.
A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments
Navarro, Daniel J., Griffiths, Thomas L.
The additive clustering model is widely used to infer the features of a set of stimuli from their similarities, on the assumption that similarity is a weighted linear function ofcommon features. This paper develops a fully Bayesian formulation of the additive clustering model, using methods from nonparametric Bayesian statistics to allow the number of features to vary. We use this to explore several approaches to parameter estimation, showing that the nonparametric Bayesian approach provides astraightforward way to obtain estimates of both the number of features used in producing similarity judgments and their importance.
Non-rigid point set registration: Coherent Point Drift
Myronenko, Andriy, Song, Xubo, Carreira-Perpiñán, Miguel Á.
We introduce Coherent Point Drift (CPD), a novel probabilistic method for nonrigid registrationof point sets. The registration is treated as a Maximum Likelihood (ML)estimation problem with motion coherence constraint over the velocity field such that one point set moves coherently to align with the second set. We formulate the motion coherence constraint and derive a solution of regularized ML estimation through the variational approach, which leads to an elegant kernel form. We also derive the EM algorithm for the penalized ML optimization with deterministic annealing. The CPD method simultaneously finds both the nonrigid transformation and the correspondence between two point sets without making any prior assumption of the transformation model except that of motion coherence. Thismethod can estimate complex nonlinear nonrigid transformations, and is shown to be accurate on 2D and 3D examples and robust in the presence of outliers and missing points.
Modeling Dyadic Data with Binary Latent Factors
Meeds, Edward, Ghahramani, Zoubin, Neal, Radford M., Roweis, Sam T.
We introduce binary matrix factorization, a novel model for unsupervised matrix decomposition.The decomposition is learned by fitting a nonparametric Bayesian probabilistic model with binary latent variables to a matrix of dyadic data. Unlike bi-clustering models, which assign each row or column to a single cluster based on a categorical hidden feature, our binary feature model reflects the prior belief that items and attributes can be associated with more than one latent cluster at a time. We provide simple learning and inference rules for this new model and show how to extend it to an infinite model in which the number of features is not a priori fixed but is allowed to grow with the size of the data.