Undirected Networks
Diffusion of Credit in Markovian Models
This paper studies the problem of diffusion in Markovian models, such as hidden Markov models (HMMs) and how it makes very difficult the task of learning of long-term dependencies in sequences. Using results from Markov chain theory, we show that the problem of diffusion is reduced if the transition probabilities approach 0 or 1. Under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations.
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
Increasing attention has been paid to reinforcement learning algo(cid:173) rithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo pol(cid:173) icy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum.
Learning Local Error Bars for Nonlinear Regression
We present a new method for obtaining local error bars for nonlinear regression, i.e., estimates of the confidence in predicted values that de(cid:173) pend on the input. We approach this problem by applying a maximum(cid:173) likelihood framework to an assumed distribution of errors. We demon(cid:173) strate our method first on computer-generated data with locally varying, normally distributed target noise. We then apply it to laser data from the Santa Fe Time Series Competition where the underlying system noise is known quantization error and the error bars give local estimates of model misspecification. In both cases, the method also provides a weighted(cid:173) regression effect that improves generalization performance.
Handwritten Word Recognition using Contextual Hybrid Radial Basis Function Network/Hidden Markov Models
A hybrid and contextual radial basis function networklhidden Markov model off-line handwritten word recognition system is presented. The task assigned to the radial basis function networks is the estimation of emission probabilities associated to Markov states. The model is contex(cid:173) tual because the estimation of emission probabilities takes into account the left context of the current image segment as represented by its pred(cid:173) ecessor in the sequence. The new system does not outperform the previ(cid:173) ous system without context but acts differently.
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
We have already shown that extracting long-term dependencies from se(cid:173) quential data is difficult, both for determimstic dynamical systems such as recurrent networks, and probabilistic models such as hidden Markov models (HMMs) or input/output hidden Markov models (IOHMMs). In practice, to avoid this problem, researchers have used domain specific a-priori knowledge to give meaning to the hidden or state variables rep(cid:173) resenting past context. In this paper, we propose to use a more general type of a-priori knowledge, namely that the temporal dependencIes are structured hierarchically. This implies that long-term dependencies are represented by variables with a long time scale. This principle is applied to a recurrent network which includes delays and multiple time scales.
Forward-backward retraining of recurrent neural networks
This paper describes the training of a recurrent neural network as the letter posterior probability estimator for a hidden Markov model, off-line handwriting recognition system. The supervised training algorithm, backpropagation through time, requires target outputs to be provided for each frame. Three methods for deriving these targets are presented. A novel method based upon the forward(cid:173) backward algorithm is found to result in the recognizer with the lowest error rate.
Factorial Hidden Markov Models
We present a framework for learning in hidden Markov models with distributed state representations. Within this framework, we de(cid:173) rive a learning algorithm based on the Expectation-Maximization (EM) procedure for maximum likelihood estimation. Analogous to the standard Baum-Welch update rules, the M-step of our algo(cid:173) rithm is exact and can be solved analytically. However, due to the combinatorial nature of the hidden state representation, the exact E-step is intractable. A simple and tractable mean field approxima(cid:173) tion is derived.
Exploiting Tractable Substructures in Intractable Networks
We develop a refined mean field approximation for inference and learning in probabilistic neural networks. Our mean field theory, unlike most, does not assume that the units behave as independent degrees of freedom; instead, it exploits in a principled way the existence of large substructures that are computationally tractable. To illustrate the advantages of this framework, we show how to incorporate weak higher order interactions into a first-order hidden Markov model, treating the corrections (but not the first order structure) within mean field theory.
A New Approach to Hybrid HMM/ANN Speech Recognition using Mutual Information Neural Networks
This paper presents a new approach to speech recognition with hybrid HMM/ANN technology. While the standard approach to hybrid HMMI ANN systems is based on the use of neural networks as posterior probability estimators, the new approach is based on the use of mutual information neural networks trained with a special learning algorithm in order to maximize the mutual information between the input classes of the network and its resulting sequence of firing output neurons during training. It is shown in this paper that such a neural network is an optimal neural vector quantizer for a discrete hidden Markov model system trained on Maximum Likelihood principles. One of the main advantages of this approach is the fact, that such neural networks can be easily combined with HMM's of any complexity with context-dependent capabilities. It is shown that the resulting hybrid system achieves very high recognition rates, which are now already on the same level as the best conventional HMM systems with continuous parameters, and the capabilities of the mutual information neural networks are not yet entirely exploited.