Learning Graphical Models
Large Margin Hidden Markov Models for Automatic Speech Recognition
We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vectormachines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our approach is specifically geared to the modeling of real-valued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworksfor ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CD-HMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradient-based methods thatscale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus.
Theory and Dynamics of Perceptual Bistability
Schrater, Paul R., Sundareswara, Rashmi
Perceptual Bistability refers to the phenomenon of spontaneously switching between twoor more interpretations of an image under continuous viewing. Although switchingbehavior is increasingly well characterized, the origins remain elusive. We propose that perceptual switching naturally arises from the brain's search for best interpretations while performing Bayesian inference. In particular, we propose that the brain explores a posterior distribution over image interpretations ata rapid time scale via a sampling-like process and updates its interpretation when a sampled interpretation is better than the discounted value of its current interpretation. Weformalize the theory, explicitly derive switching rate distributions and discuss qualitative properties of the theory including the effect of changes in the posterior distribution on switching rates. Finally, predictions of the theory are shown to be consistent with measured changes in human switching dynamics to Necker cube stimuli induced by context.
Large Scale Hidden Semi-Markov SVMs
Rätsch, Gunnar, Sonnenburg, Sören
We describe Hidden Semi-Markov Support Vector Machines (SHM SVMs), an extension of HM SVMs to semi-Markov chains. This allows us to predict segmentations ofsequences based on segment-based features measuring properties such as the length of the segment. We propose a novel technique to partition the problem into sub-problems. The independently obtained partial solutions can then be recombined in an efficient way, which allows us to solve label sequence learning problemswith several thousands of labeled sequences. We have tested our algorithm for predicting gene structures, an important problem in computational biology. Results on a well-known model organism illustrate the great potential of SHM SVMs in computational biology.
Parameter Expanded Variational Bayesian Methods
Bayesian inference has become increasingly important in statistical machine learning. Exact Bayesian calculations are often not feasible in practice, however. A number of approximate Bayesian methods have been proposed to make such calculations practical, among them the variational Bayesian (VB) approach. The VB approach, while useful, can nevertheless suffer from slow convergence to the approximate solution. To address this problem, we propose Parameter-eXpanded Variational Bayesian (PX-VB) methods to speed up VB. The new algorithm is inspired byparameter-expanded expectation maximization (PX-EM) and parameterexpanded dataaugmentation (PX-DA). Similar to PX-EM and -DA, PX-VB expands a model with auxiliary variables to reduce the coupling between variables in the original model. We analyze the convergence rates of VB and PX-VB and demonstrate the superior convergence rates of PX-VB in variational probit regression andautomatic relevance determination.
Bayesian Model Scoring in Markov Random Fields
Scoring structures of undirected graphical models by means of evaluating the marginal likelihood is very hard. The main reason is the presence of the partition functionwhich is intractable to evaluate, let alone integrate over. We propose to approximate the marginal likelihood by employing two levels of approximation: we assume normality of the posterior (the Laplace approximation) and approximate allremaining intractable quantities using belief propagation and the linear response approximation.
The Neurodynamics of Belief Propagation on Binary Markov Random Fields
We rigorously establish a close relationship between message passing algorithms and models of neurodynamics by showing that the equations of a continuous Hopfield networkcan be derived from the equations of belief propagation on a binary Markov random field. As Hopfield networks are equipped with a Lyapunov function, convergenceis guaranteed. As a consequence, in the limit of many weak connections perneuron, Hopfield networks exactly implement a continuous-time variant of belief propagation starting from message initialisations that prevent from running into convergence problems. Our results lead to a better understanding of the role of message passing algorithms in real biological neural networks.
A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments
Navarro, Daniel J., Griffiths, Thomas L.
The additive clustering model is widely used to infer the features of a set of stimuli from their similarities, on the assumption that similarity is a weighted linear function ofcommon features. This paper develops a fully Bayesian formulation of the additive clustering model, using methods from nonparametric Bayesian statistics to allow the number of features to vary. We use this to explore several approaches to parameter estimation, showing that the nonparametric Bayesian approach provides astraightforward way to obtain estimates of both the number of features used in producing similarity judgments and their importance.
Non-rigid point set registration: Coherent Point Drift
Myronenko, Andriy, Song, Xubo, Carreira-Perpiñán, Miguel Á.
We introduce Coherent Point Drift (CPD), a novel probabilistic method for nonrigid registrationof point sets. The registration is treated as a Maximum Likelihood (ML)estimation problem with motion coherence constraint over the velocity field such that one point set moves coherently to align with the second set. We formulate the motion coherence constraint and derive a solution of regularized ML estimation through the variational approach, which leads to an elegant kernel form. We also derive the EM algorithm for the penalized ML optimization with deterministic annealing. The CPD method simultaneously finds both the nonrigid transformation and the correspondence between two point sets without making any prior assumption of the transformation model except that of motion coherence. Thismethod can estimate complex nonlinear nonrigid transformations, and is shown to be accurate on 2D and 3D examples and robust in the presence of outliers and missing points.
Modeling Dyadic Data with Binary Latent Factors
Meeds, Edward, Ghahramani, Zoubin, Neal, Radford M., Roweis, Sam T.
We introduce binary matrix factorization, a novel model for unsupervised matrix decomposition.The decomposition is learned by fitting a nonparametric Bayesian probabilistic model with binary latent variables to a matrix of dyadic data. Unlike bi-clustering models, which assign each row or column to a single cluster based on a categorical hidden feature, our binary feature model reflects the prior belief that items and attributes can be associated with more than one latent cluster at a time. We provide simple learning and inference rules for this new model and show how to extend it to an infinite model in which the number of features is not a priori fixed but is allowed to grow with the size of the data.