Markov Models
Convergence and Pattern-Stabilization in the Boltzmann Machine
The Boltzmann Machine has been introduced as a means to perform global optimization for multimodal objective functions using the principles of simulated annealing. In this paper we consider its utility as a spurious-free content-addressable memory, and provide bounds on its performance in this context. We show how to exploit the machine's ability to escape local minima, in order to use it, at a constant temperature, for unambiguous associative pattern-retrieval in noisy environments. An association rule, which creates a sphere of influence around each stored pattern, is used along with the Machine's dynamics to match the machine's noisy input with one of the pre-stored patterns. Spurious fIxed points, whose regions of attraction are not recognized by the rule, are skipped, due to the Machine's fInite probability to escape from any state.
Links Between Markov Models and Multilayer Perceptrons
Hidden Markov models are widely used for automatic speech recog(cid:173) nition. They inherently incorporate the sequential character of the speech signal and are statistically trained. However, the a-priori choice of the model topology limits their flexibility. Another draw(cid:173) back of these models is their weak discriminating power. Multilayer perceptrons are now promising tools in the connectionist approach for classification problems and have already been successfully tested on speech recognition problems. However, the sequential nature of the speech signal remains difficult to handle in that kind of ma(cid:173) chine.
The Boltzmann Perceptron Network: A Multi-Layered Feed-Forward Network Equivalent to the Boltzmann Machine
The concept of the stochastic Boltzmann machine (BM) is auractive for decision making and pattern classification purposes since the probability of attaining the network states is a function of the network energy. Hence, the probability of attaining particular energy minima may be associated with the probabilities of making certain decisions (or classifications). However, because of its stochastic nature, the complexity of the BM is fairly high and therefore such networks are not very likely to be used in practice. In this paper we suggest a way to alleviate this drawback by converting the sto(cid:173) chastic BM into a deterministic network which we call the Boltzmann Per(cid:173) ceptron Network (BPN). The BPN is functionally equivalent to the BM but has a feed-forward structure and low complexity. The conditions under which such a convmion is feasible are given.
A Continuous Speech Recognition System Embedding MLP into HMM
We are developing a phoneme based. In [Bourlard & Wellekens]. it was shown that MLPs were approximating Maximum a Posteriori (MAP) probabilities and could thus be embedded as an emission probability estimator in HMMs. It is shown here that word recognition performance for a simple discrete density HMM system appears to be somewhat better when MLP methods are used to estimate the emission probabilities.
HMM Speech Recognition with Neural Net Discrimination
Two approaches were explored which integrate neural net classifiers with Hidden Markov Model (HMM) speech recognizers. Both at(cid:173) tempt to improve speech pattern discrimination while retaining the temporal processing advantages of HMMs. One approach used neu(cid:173) ral nets to provide second-stage discrimination following an HMM recognizer. On a small vocabulary task, Radial Basis Function (RBF) and back-propagation neural nets reduced the error rate substantially (from 7.9% to 4.2% for the RBF classifier). In a larger vocabulary task, neural net classifiers did not reduce the error rate.
Coupled Markov Random Fields and Mean Field Theory
In recent years many researchers have investigated the use of Markov Random Fields (MRFs) for computer vision. They can be applied for example to reconstruct surfaces from sparse and noisy depth data coming from the output of a visual process, or to integrate early vision processes to label physical discontinuities. In this pa(cid:173) per we show that by applying mean field theory to those MRFs models a class of neural networks is obtained. Those networks can speed up the solution for the MRFs models. The method is not restricted to computer vision.
Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters
One of the attractions of neural network approaches to pattern recognition is the use of a discrimination-based training method. We show that once we have modified the output layer of a multi(cid:173) layer perceptron to provide mathematically correct probability dis(cid:173) tributions, and replaced the usual squared error criterion with a probability-based score, the result is equivalent to Maximum Mu(cid:173) tual Information training, which has been used successfully to im(cid:173) prove the performance of hidden Markov models for speech recog(cid:173) nition. If the network is specially constructed to perform the recog(cid:173) nition computations of a given kind of stochastic model based clas(cid:173) sifier then we obtain a method for discrimination-based training of the parameters of the models. Examples include an HMM-based word discriminator, which we call an'Alphanet'.
A Method for the Efficient Design of Boltzmann Machines for Classiffication Problems
We introduce a method for the efficient design of a Boltzmann machine (or a Hopfield net) that computes an arbitrary given Boolean function f . This method is based on an efficient simulation of acyclic circuits with threshold gates by Boltzmann machines. As a consequence we can show that various concrete Boolean functions f that are relevant for classification problems can be computed by scalable Boltzmann machines that are guaranteed to converge to their global maximum configuration with high probability after constantly many steps.
RecNorm: Simultaneous Normalisation and Classification applied to Speech Recognition
A particular form of neural network is described, which has terminals for acoustic patterns, class labels and speaker parameters. A method of training this network to "tune in" the speaker parameters to a particular speaker is outlined, based on a trick for converting a supervised network to an unsupervised mode. We describe experiments using this approach in isolated word recognition based on whole-word hidden Markov models. The results indicate an improvement over speaker-independent perfor(cid:173) mance and, for unlabelled data, a performance close to that achieved on labelled data.
Connectionist Approaches to the Use of Markov Models for Speech Recognition
Previous work has shown the ability of Multilayer Perceptrons (MLPs) to estimate emission probabilities for Hidden Markov Mod(cid:173) els (HMMs). The advantages of a speech recognition system incor(cid:173) porating both MLPs and HMMs are the best discrimination and the ability to incorporate multiple sources of evidence (features, temporal context) without restrictive assumptions of distributions or statistical independence. This paper presents results on the speaker-dependent portion of DARPA's English language Resource Management database. Results support the previously reported utility of MLP probability estimation for continuous speech recog(cid:173) nition. An additional approach we are pursuing is to use MLPs as nonlinear predictors for autoregressive HMMs.