Goto

Collaborating Authors

 Undirected Networks


On a Modification to the Mean Field EM Algorithm in Factorial Learning

Neural Information Processing Systems

A modification is described to the use of mean field approximations in the E step of EM algorithms for analysing data from latent structure models, as described by Ghahramani (1995), among others. The modification involves second-order Taylor approximations to expectations computed in the E step. The potential benefits of the method are illustrated using very simple latent profile models.


On a Modification to the Mean Field EM Algorithm in Factorial Learning

Neural Information Processing Systems

A modification is described to the use of mean field approximations inthe E step of EM algorithms for analysing data from latent structure models, as described by Ghahramani (1995), among others. Themodification involves second-order Taylor approximations to expectations computed in the E step. The potential benefits of the method are illustrated using very simple latent profile models. 1 Introduction Ghahramani (1995) advocated the use of mean field methods as a means to avoid the heavy computation involved in the E step of the EM algorithm used for estimating parameters within a certain latent structure model, and Ghahramani & Jordan (1995) used the same ideas in a more complex situation. Dunmur & Titterington (1996a) identified Ghahramani's model as a so-called latent profile model, they observed that Zhang (1992,1993) had used mean field methods for a similar purpose, and they showed, in a simulation study based on very simple examples, that the mean field version of the EM algorithm often performed very respectably. By this it is meant that, when data were generated from the model under analysis, the estimators of the underlying parameters were efficient, judging by empirical results, especially in comparison with estimators obtained by employing the'correct' EM algorithm: the examples therefore had to be simple enough that the correct EM algorithm is numerically feasible, although any success reported for the mean field 432 A. P. Dunmur and D. M. Titterington version is, one hopes, an indication that the method will also be adequate in more complex situations in which the correct EM algorithm is not implementable because of computational complexity. In spite of the above positive remarks, there were circumstances in which there was a perceptible, if not dramatic, lack of efficiency in the simple (naive) mean field estimators, and the objective of this contribution is to propose and investigate ways of refining the method so as to improve performance without detracting from the appealing, and frequently essential, simplicity of the approach. The procedure used here is based on a second order correction to the naive mean field well known in statistical physics and sometimes called the cavity or TAP method (Mezard, Parisi & Virasoro, 1987). It has been applied recently in cluster analysis (Hofmann & Buhmann, 1996). In Section 2 we introduce the structure of our model, Section 3 explains the refined mean field approach, Section 4 provides numerical results, and Section 5 contains a statement of our conclusions.


Hidden Markov Decision Trees

Neural Information Processing Systems

We study a time series model that can be viewed as a decision tree with Markov temporal structure. The model is intractable for exact calculations, thus we utilize variational approximations. We consider three different distributions for the approximation: one in which the Markov calculations are performed exactly and the layers of the decision tree are decoupled, one in which the decision tree calculations are performed exactly and the time steps of the Markov chain are decoupled, and one in which a Viterbi-like assumption is made to pick out a single most likely state sequence.


Approximate Solutions to Optimal Stopping Problems

Neural Information Processing Systems

We propose and analyze an algorithm that approximates solutions to the problem of optimal stopping in a discounted irreducible aperiodic Markovchain. The scheme involves the use of linear combinations offixed basis functions to approximate a Q-function. The weights of the linear combination are incrementally updated through an iterative process similar to Q-Iearning, involving simulation ofthe underlying Markov chain. Due to space limitations, we only provide an overview of a proof of convergence (with probability 1)and bounds on the approximation error. This is the first theoretical result that establishes the soundness of a Q-Iearninglike algorithmwhen combined with arbitrary linear function approximators tosolve a sequential decision problem.


Analysis of Temporal-Diffference Learning with Function Approximation

Neural Information Processing Systems

The algorithm weanalyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control. Anexample in time-series prediction is that of estimating the net present value of a corporation, as a discounted sum of its future cash flows, based on the current state of its operations. In control, the ability to predict long-term future cost as a function of state enables the ranking of alternative states in order to guide decision-making. Indeed, such predictions constitute the cost-to-go function that is central to dynamic programming and optimal control (Bertsekas, 1995). Temporal-difference learning, originally proposed by Sutton (1988), is a method for approximating long-term future cost as a function of current state.


Interpreting Images by Propagating Bayesian Beliefs

Neural Information Processing Systems

A central theme of computational vision research has been the realization thatreliable estimation of local scene properties requires propagating measurements across the image. Many authors have therefore suggested solving vision problems using architectures of locally connected units updating their activity in parallel. Unfortunately, theconvergence of traditional relaxation methods on such architectures has proven to be excruciatingly slow and in general they do not guarantee that the stable point will be a global minimum. In this paper we show that an architecture in which Bayesian Beliefs aboutimage properties are propagated between neighboring units yields convergence times which are several orders of magnitude fasterthan traditional methods and avoids local minima. In particular our architecture is non-iterative in the sense of Marr [5]: at every time step, the local estimates at a given location are optimal giventhe information which has already been propagated to that location. We illustrate the algorithm's performance on real images and compare it to several existing methods. 1 Theory The essence of our approach is shown in figure 1. Figure 1a shows the prototypical ill-posed problem: interpolation of a function from sparse data.


A New Approach to Hybrid HMM/ANN Speech Recognition using Mutual Information Neural Networks

Neural Information Processing Systems

This paper presents a new approach to speech recognition with hybrid HMM/ANN technology. While the standard approach to hybrid HMMI ANN systems is based on the use of neural networks as posterior probability estimators, the new approach is based on the use of mutual information neural networks trained with a special learning algorithm in order to maximize the mutual information between the input classes of the network and its resulting sequence of firing output neurons during training. It is shown in this paper that such a neural network is an optimal neural vector quantizer for a discrete hidden Markov model system trained on Maximum Likelihood principles. One of the main advantages of this approach is the fact, that such neural networks can be easily combined with HMM's of any complexity with context-dependent capabilities. It is shown that the resulting hybrid system achieves very high recognition rates, which are now already on the same level as the best conventional HMM systems with continuous parameters, and the capabilities of the mutual information neural networks are not yet entirely exploited.


Dynamic Features for Visual Speechreading: A Systematic Comparison

Neural Information Processing Systems

Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for performing thistask. The main purpose of this research was to systematically comparethe performance of a range of dynamic visual features on a speechreading task. We have found that normalization ofimages to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in generalization performanceregardless of the visual representation used. In addition, the dynamic information in the difference between successive framesyielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked surprisingly betterthan global principal components analysis (PCA). These results are examined and possible explanations are explored.


A Micropower Analog VLSI HMM State Decoder for Wordspotting

Neural Information Processing Systems

We describe the implementation of a hidden Markov model state decoding system, a component for a wordspotting speech recognition system.The key specification for this state decoder design is microwatt power dissipation; this requirement led to a continuoustime, analogcircuit implementation. We characterize the operation of a 10-word (81 state) state decoder test chip.


Clustering Sequences with Hidden Markov Models

Neural Information Processing Systems

This paper discusses a probabilistic model-based approach to clustering sequences,using hidden Markov models (HMMs) . The problem can be framed as a generalization of the standard mixture model approach to clustering in feature space. Two primary issues are addressed. First, a novel parameter initialization procedure is proposed, and second, the more difficult problem of determining the number of clusters K, from the data, is investigated. Experimental resultsindicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences.