Markov Models
Learning Factored Representations for Partially Observable Markov Decision Processes
The problem of reinforcement learning in a non-Markov environment is explored using a dynamic Bayesian network, where conditional indepen(cid:173) dence assumptions between random variables are compactly represented by network parameters. The parameters are learned on-line, and approx(cid:173) imations are used to perform inference and to compute the optimal value function. The relative effects of inference and value function approxi(cid:173) mations on the quality of the final policy are investigated, by learning to solve a moderately difficult driving task. The two value function approx(cid:173) imations, linear and quadratic, were found to perform similarly, but the quadratic model was more sensitive to initialization. Both performed be(cid:173) low the level of human performance on the task.
Reinforcement Learning Using Approximate Belief States
The problem of developing good policies for partially observable Markov decision problems (POMDPs) remains one of the most challenging ar(cid:173) eas of research in stochastic planning. One line of research in this area involves the use of reinforcement learning with belief states, probabil(cid:173) ity distributions over the underlying model states. This is a promis(cid:173) ing method for small problems, but its application is limited by the in(cid:173) tractability of computing or representing a full belief state for large prob(cid:173) lems. Recent work shows that, in many settings, we can maintain an approximate belief state, which is fairly close to the true belief state. In particular, great success has been shown with approximate belief states that marginalize out correlations between state variables.
Factored Semi-Tied Covariance Matrices
A new form of covariance modelling for Gaussian mixture models and hidden Markov models is presented. This is an extension to an efficient form of covariance modelling used in speech recognition, semi-tied co(cid:173) variance matrices. In the standard form of semi-tied covariance matrices the covariance matrix is decomposed into a highly shared decorrelating transform and a component-specific diagonal covariance matrix. The use of a factored decorrelating transform is presented in this paper. This fac(cid:173) toring effectively increases the number of possible transforms without in(cid:173) creasing the number of free parameters.
Partially Observable SDE Models for Image Sequence Recognition Tasks
This paper explores a framework for recognition of image sequences using partially observable stochastic differential equation (SDE) models. Monte-Carlo importance sampling techniques are used for efficient estimation of sequence likelihoods and sequence likelihood gradients. Once the network dynamics are learned, we apply the SDE models to sequence recognition tasks in a manner similar to the way Hidden Markov models (HMMs) are commonly applied. The potential advantage of SDEs over HMMS is the use of contin(cid:173) uous state dynamics. We present encouraging results for a video sequence recognition task in which SDE models provided excellent performance when compared to hidden Markov models.
Feature Correspondence: A Markov Chain Monte Carlo Approach
When trying to recover 3D structure from a set of images, the most difficult problem is establishing the correspondence between the measurements. Most existing approaches assume that features can be tracked across frames, whereas methods that exploit rigidity constraints to facilitate matching do so only under restricted cam(cid:173) era motion. In this paper we propose a Bayesian approach that avoids the brittleness associated with singling out one "best" cor(cid:173) respondence, and instead consider the distribution over all possible correspondences. We treat both a fully Bayesian approach that yields a posterior distribution, and a MAP approach that makes use of EM to maximize this posterior. We show how Markov chain Monte Carlo methods can be used to implement these techniques in practice, and present experimental results on real data.
Interactive Parts Model: An Application to Recognition of On-line Cursive Script
In this work, we introduce an Interactive Parts (IP) model as an alternative to Hidden Markov Models (HMMs). We tested both models on a database of on-line cursive script. We show that im(cid:173) plementations of HMMs and the IP model, in which all letters are assumed to have the same average width, give comparable results. However, in contrast to HMMs, the IP model can handle duration modeling without an increase in computational complexity.
New Approaches Towards Robust and Adaptive Speech Recognition
In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) channels representing the speech signal are processed by different (independent) "experts", each expert focusing on a different char(cid:173) acteristic of the signal, and that the different stream likelihoods (or posteriors) are combined at some (temporal) stage to yield a global recognition output. As a further extension to multi-stream ASR, we will finally introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state spe(cid:173) cific feature based HMMs responsible for merging the stream infor(cid:173) mation and modeling their possible correlation. Current automatic speech recognition systems are based on (context-dependent or context-independent) phone models described in terms of a sequence of hidden Markov model (HMM) states, where each HMM state is assumed to be character(cid:173) ized by a stationary probability density function.
Rate-coded Restricted Boltzmann Machines for Face Recognition
We describe a neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual. Individuals are then recognized by finding the highest relative probability pair among all pairs that consist of a test image and an image whose identity is known. Our method compares favorably with other methods in the literature. The generative model consists of a single layer of rate-coded, non-linear feature detectors and it has the property that, given a data vector, the true posterior probability distribution over the feature detector activities can be inferred rapidly without iteration or approximation. The weights of the feature detectors are learned by com(cid:173) paring the correlations of pixel intensities and feature activations in two phases: When the network is observing real data and when it is observing reconstructions of real data generated from the feature activations.
High-temperature Expansions for Learning Models of Nonnegative Data
Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy generative model is a Non(cid:173) negative Boltzmann Distribution not a Gaussian distribution, when the model is constrained to match the first and second order statistics of the data. Learning for practical sized problems is made difficult by the need to compute expectations under the model distribution. The computa(cid:173) tional cost of Markov chain Monte Carlo methods and low fidelity of naive mean field techniques has led to increasing interest in advanced mean field theories and variational methods. Here I present a second(cid:173) order mean-field approximation for the Nonnegative Boltzmann Machine model, obtained using a "high-temperature" expansion.
The Infinite Hidden Markov Model
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected num- ber of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite-- consider, for example, symbols being possible words appearing in En- glish text.