Undirected Networks
Boltzmann Machine Learning Using Mean Field Theory and Linear Response Correction
Kappen, Hilbert J., Ortiz, Francisco de Borja Rodrรญguez
We present a new approximate learning algorithm for Boltzmann Machines, using a systematic expansion of the Gibbs free energy to second order in the weights. The linear response correction to the correlations is given by the Hessian of the Gibbs free energy. The computational complexity of the algorithm is cubic in the number of neurons. We compare the performance of the exact BM learning algorithm with first order (Weiss) mean field theory and second order (TAP) mean field theory. The learning task consists of a fully connected Ising spin glass model on 10 neurons. We conclude that 1) the method works well for paramagnetic problems 2) the TAP correction gives a significant improvement over the Weiss mean field theory, both for paramagnetic and spin glass problems and 3) that the inclusion of diagonal weights improves the Weiss approximation for paramagnetic problems, but not for spin glass problems.
On the Separation of Signals from Neighboring Cells in Tetrode Recordings
Sahani, Maneesh, Pezaris, John S., Andersen, Richard A.
We discuss a solution to the problem of separating waveforms produced bymultiple cells in an extracellular neural recording. We take an explicitly probabilistic approach, using latent-variable models ofvarying sophistication to describe the distribution of waveforms producedby a single cell. The models range from a single Gaussian distribution of waveforms for each cell to a mixture of hidden Markov models. We stress the overall statistical structure of the approach, allowing the details of the generative model chosen to depend on the specific neural preparation.
Comparison of Human and Machine Word Recognition
Schenkel, Markus, Latimer, Cyril, Jabri, Marwan A.
We present a study which is concerned with word recognition rates for heavily degraded documents. We compare human with machine reading capabilitiesin a series of experiments, which explores the interaction of word/non-word recognition, word frequency and legality of non-words with degradation level. We also study the influence of character segmentation, andcompare human performance with that of our artificial neural network model for reading. We found that the proposed computer model uses word context as efficiently as humans, but performs slightly worse on the pure character recognition task. 1 Introduction Optical Character Recognition (OCR) of machine-print document images ยทhas matured considerably during the last decade. Recognition rates as high as 99.5% have been reported ongood quality documents. However, for lower image resolutions (200 Dpl and below), noisy images, images with blur or skew, the recognition rate declines considerably. Inbad quality documents, character segmentation is as big a problem as the actual character recognition.
Approximate Solutions to Optimal Stopping Problems
Tsitsiklis, John N., Roy, Benjamin Van
We propose and analyze an algorithm that approximates solutions to the problem of optimal stopping in a discounted irreducible aperiodic Markov chain. The scheme involves the use of linear combinations of fixed basis functions to approximate a Q-function. The weights of the linear combination are incrementally updated through an iterative process similar to Q-Iearning, involving simulation of the underlying Markov chain. Due to space limitations, we only provide an overview of a proof of convergence (with probability 1) and bounds on the approximation error. This is the first theoretical result that establishes the soundness of a Q-Iearninglike algorithm when combined with arbitrary linear function approximators to solve a sequential decision problem.
Analysis of Temporal-Diffference Learning with Function Approximation
Tsitsiklis, John N., Roy, Benjamin Van
We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control.
A New Approach to Hybrid HMM/ANN Speech Recognition using Mutual Information Neural Networks
Rigoll, Gerhard, Neukirchen, Christoph
This paper presents a new approach to speech recognition with hybrid HMM/ANN technology. While the standard approach to hybrid HMMI ANN systems is based on the use of neural networks as posterior probability estimators, the new approach is based on the use of mutual information neural networks trained with a special learning algorithm in order to maximize the mutual information between the input classes of the network and its resulting sequence of firing output neurons during training. It is shown in this paper that such a neural network is an optimal neural vector quantizer for a discrete hidden Markov model system trained on Maximum Likelihood principles. One of the main advantages of this approach is the fact, that such neural networks can be easily combined with HMM's of any complexity with context-dependent capabilities. It is shown that the resulting hybrid system achieves very high recognition rates, which are now already on the same level as the best conventional HMM systems with continuous parameters, and the capabilities of the mutual information neural networks are not yet entirely exploited.
Dynamic Features for Visual Speechreading: A Systematic Comparison
Gray, Michael S., Movellan, Javier R., Sejnowski, Terrence J.
Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for performing this task. The main purpose of this research was to systematically compare the performance of a range of dynamic visual features on a speechreading task. We have found that normalization of images to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in generalization performance regardless of the visual representation used. In addition, the dynamic information in the difference between successive frames yielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked surprisingly better than global principal components analysis (PCA). These results are examined and possible explanations are explored.
A Micropower Analog VLSI HMM State Decoder for Wordspotting
Lazzaro, John, Wawrzynek, John, Lippmann, Richard P.
We describe the implementation of a hidden Markov model state decoding system, a component for a wordspotting speech recognition system. The key specification for this state decoder design is microwatt power dissipation; this requirement led to a continuoustime, analog circuit implementation. We characterize the operation of a 10-word (81 state) state decoder test chip.
Clustering Sequences with Hidden Markov Models
This paper discusses a probabilistic model-based approach to clustering sequences, using hidden Markov models (HMMs). The problem can be framed as a generalization of the standard mixture model approach to clustering in feature space. Two primary issues are addressed. First, a novel parameter initialization procedure is proposed, and second, the more difficult problem of determining the number of clusters K, from the data, is investigated. Experimental results indicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences.