Goto

Collaborating Authors

 Bayesian Inference


Indian Buffet Processes with Power-law Behavior

Neural Information Processing Systems

The Indian buffet process (IBP) is an exchangeable distribution over binary matrices used in Bayesian nonparametric featural models. In this paper we propose a three-parameter generalization of the IBP exhibiting power-law behavior. We achieve this by generalizing the beta process (the de Finetti measure of the IBP) to the \emph{stable-beta process} and deriving the IBP corresponding to it. We find interesting relationships between the stable-beta process and the Pitman-Yor process (another stochastic process used in Bayesian nonparametric models with interesting power-law properties). We show that our power-law IBP is a good model for word occurrences in documents with improved performance over the normal IBP.


Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition

Neural Information Processing Systems

We consider the problem of using nearest neighbor methods to provide a conditional probability estimate, P(y|a), when the number of labels y is large and the labels share some underlying structure. We propose a method for learning error-correcting output codes (ECOCs) to model the similarity between labels within a nearest neighbor framework. The learned ECOCs and nearest neighbor information are used to provide conditional probability estimates. We apply these estimates to the problem of acoustic modeling for speech recognition. We demonstrate an absolute reduction in word error rate (WER) of 0.9% (a 2.5% relative reduction in WER) on a lecture recognition task over a state-of-the-art baseline GMM model.


Occlusive Components Analysis

Neural Information Processing Systems

We study unsupervised learning in a probabilistic generative model for occlusion. The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. This depth order then determines how the positions and appearances of the objects present, specified in the model parameters, combine to form the image. We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. Exact maximum-likelihood learning is intractable. However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. In numerical experiments it is shown that these approximations recover the correct set of object parameters. Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. The model and the learning algorithm thus connect research on occlusion with the research field of multiple-cause component extraction methods.


Bayesian Belief Polarization

Neural Information Processing Systems

Empirical studies have documented cases of belief polarization, where two people withopposing prior beliefs both strengthen their beliefs after observing the same evidence. Belief polarization is frequently offered as evidence of human irrationality, but we demonstrate that this phenomenon is consistent with a fully Bayesian approach to belief revision. Simulation results indicate that belief polarization isnot only possible but relatively common within the set of Bayesian models that we consider. Suppose that Carol has requested a promotion at her company and has received a score of 50 on an aptitude test. Alice, one of the company's managers, began with a high opinion of Carol and became even more confident of her abilities after seeing her test score.



Learning Transformational Invariants from Natural Movies

Neural Information Processing Systems

We describe a hierarchical, probabilistic model that learns to extract complex motion frommovies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed interms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training onnatural movies, the top layer units discover the structure of phase-shifts within the first layer.


Stochastic Relational Models for Large-scale Dyadic Data using MCMC

Neural Information Processing Systems

Stochastic relational models (SRMs) [15] provide a rich family of choices for learning and predicting dyadic data between two sets of entities. The models generalize matrixfactorization to a supervised learning problem that utilizes attributes of entities in a hierarchical Bayesian framework. Previously variational Bayes inference wasapplied for SRMs, which is, however, not scalable when the size of either entity set grows to tens of thousands. In this paper, we introduce a Markov chain Monte Carlo (MCMC) algorithm for equivalent models of SRMs in order to scale the computation to very large dyadic data sets. Both superior scalability and predictive accuracy are demonstrated on a collaborative filtering problem, which involves tens of thousands users and half million items.


Which graphical models are difficult to learn?

Neural Information Processing Systems

We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i.i.d. samples. While several methods have been proposed to accomplish this task, their relative merits and limitations remain somewhat obscure. By analyzing a number of concrete examples, we show that low-complexity algorithms systematically fail when the Markov random field develops long-range correlations. More precisely, this phenomenon appears to be related to the Ising model phase transition (although it does not coincide with it).


Posterior Consistency of the Silverman g-prior in Bayesian Model Choice

Neural Information Processing Systems

Kernel supervised learning methods can be unified by utilizing the tools from regularization theory. The duality between regularization and prior leads to interpreting regularization methods in terms of maximum a posteriori estimation and has motivated Bayesian interpretations of kernel methods. In this paper we pursue a Bayesian interpretation of sparsity in the kernel setting by making use of a mixture of a point-mass distribution and prior that we refer to as ``Silverman's g-prior.'' We provide a theoretical analysis of the posterior consistency of a Bayesian model choice procedure based on this prior. We also establish the asymptotic relationship between this procedure and the Bayesian information criterion.


Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

Neural Information Processing Systems

Multi-level hierarchical models provide an attractive framework for incorporating correlations induced in a response variable organized in a hierarchy. Model fitting is challenging, especially for hierarchies with large number of nodes. We provide a novel algorithm based on a multi-scale Kalman filter that is both scalable and easy to implement. For non-Gaussian responses, quadratic approximation to the log-likelihood results in biased estimates. We suggest a bootstrap strategy to correct such biases. Our method is illustrated through simulation studies and analyses of real world data sets in health care and online advertising.