Markov Models
Training Algorithms for Hidden Markov Models using Entropy Based Distance Functions
By adapting a framework used for supervised learning, we construct iterative algorithms that maximize the likelihood of the observations while also attempting to stay "close" to the current estimated parameters. We use a bound on the relative entropy between the two HMMs as a distance mea(cid:173) sure between them. The result is new iterative training algorithms which are similar to the EM (Baum-Welch) algorithm for training HMMs. The proposed algorithms are composed of a step similar to the expectation step of Baum-Welch and a new update of the parameters which replaces the maximization (re-estimation) step. The algorithm takes only negligi(cid:173) bly more time per iteration and an approximated version uses the same expectation step as Baum-Welch.
Hidden Markov Decision Trees
We study a time series model that can be viewed as a decision tree with Markov temporal structure. The model is intractable for exact calculations, thus we utilize variational approximations. We consider three different distributions for the approximation: one in which the Markov calculations are performed exactly and the layers of the decision tree are decoupled, one in which the decision tree calculations are performed exactly and the time steps of the Markov chain are decoupled, and one in which a Viterbi-like assumption is made to pick out a single most likely state sequence.
Clustering Sequences with Hidden Markov Models
This paper discusses a probabilistic model-based approach to clus(cid:173) tering sequences, using hidden Markov models (HMMs) . The prob(cid:173) lem can be framed as a generalization of the standard mixture model approach to clustering in feature space. Two primary issues are addressed. First, a novel parameter initialization procedure is proposed, and second, the more difficult problem of determining the number of clusters K, from the data, is investigated. Experi(cid:173) mental results indicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences.
A Micropower Analog VLSI HMM State Decoder for Wordspotting
We describe the implementation of a hidden Markov model state decoding system, a component for a wordspotting speech recogni(cid:173) tion system. The key specification for this state decoder design is microwatt power dissipation; this requirement led to a continuous(cid:173) time, analog circuit implementation. We characterize the operation of a 10-word (81 state) state decoder test chip.
Modeling Acoustic Correlations by Factor Analysis
Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the short(cid:173) time properties of speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction . Factor analysis uses a small number of parameters to model the covariance struc(cid:173) ture of high dimensional data. These parameters are estimated by an Expectation-Maximization (EM) algorithm that can be em(cid:173) bedded in the training procedures for HMMs.
An Improved Policy Iteration Algorithm for Partially Observable MDPs
A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finite-state controller. This representation makes policy evaluation straightforward. The pa(cid:173) per's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the trans(cid:173) formation of a finite-state controller into an improved finite-state con(cid:173) troller. The new algorithm consistently outperforms value iteration as an approach to solving infinite-horizon problems.
How to Dynamically Merge Markov Decision Processes
We are frequently called upon to perform multiple tasks that com(cid:173) pete for our attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to efficiently find good solutions for doing the tasks in parallel. We formulate this problem as that of dynamically merging multiple Markov decision processes (MDPs) into a composite MDP, and present a new theoretically-sound dy(cid:173) namic programming algorithm for finding an optimal policy for the composite MDP. We analyze various aspects of our algorithm and illustrate its use on a simple merging problem. Every day, we are faced with the problem of doing mUltiple tasks in parallel, each of which competes for our attention and resource.
Analysis of Drifting Dynamics with Neural Network Hidden Markov Models
We present a method for the analysis of nonstationary time se(cid:173) ries with multiple operating modes. In particular, it is possible to detect and to model both a switching of the dynamics and a less abrupt, time consuming drift from one mode to another. This is achieved in two steps. Then, the trained experts are used in a hidden Markov model that allows to model drifts. An application to physiological wake/sleep data demonstrates that analysis and modeling of real-world time series can be improved when the drift paradigm is taken into account.
On the Separation of Signals from Neighboring Cells in Tetrode Recordings
We discuss a solution to the problem of separating waveforms pro(cid:173) duced by multiple cells in an extracellular neural recording. We take an explicitly probabilistic approach, using latent-variable mod(cid:173) els of varying sophistication to describe the distribution of wave(cid:173) forms produced by a single cell. The models range from a single Gaussian distribution of waveforms for each cell to a mixture of hidden Markov models. We stress the overall statistical structure of the approach, allowing the details of the generative model chosen to depend on the specific neural preparation.
Nonlinear Markov Networks for Continuous Variables
We address the problem oflearning structure in nonlinear Markov networks with continuous variables. This can be viewed as non-Gaussian multidi(cid:173) mensional density estimation exploiting certain conditional independencies in the variables. Markov networks are a graphical way of describing con(cid:173) ditional independencies well suited to model relationships which do not ex(cid:173) hibit a natural causal ordering. We use neural network structures to model the quantitative relationships between variables. The main focus in this pa(cid:173) per will be on learning the structure for the purpose of gaining insight into the underlying process.