Learning Graphical Models
The Power of Amnesia
Ron, Dana, Singer, Yoram, Tishby, Naftali
We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate theprocess, whereas on large scales, more syntactic and semantic informationis carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory modelsuniformly is not practical even for as short memory as four.
Research Issues in Qualitative and Abstract Probability
To assess the state of the art and identify issues requiring further investigation, a workshop on qualitative and abstract probability was held during the third week of November 1993. This workshop brought together a mix of active researchers from academia, industry, and government interested in the practical and theoretical impact of these abstractions on techniques, methods, and tools for solving complex AI tasks. The result was a set of specific recommendations on the most promising and important avenues for future research.
Operations for Learning with Graphical Models
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, andthe manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximizationalgorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper concludes by sketching some implications for data analysis and summarizing how some popular algorithms fall within the framework presented. The main original contributions here are the decompositiontechniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms.
A Review of Statistical Language Learning
Several factors Chapter 2 describes a small fragment Chapters 8, 9, and 10 describe have led to the increase in interest in of probability and information recent research on more isolated this field, which is heavily influenced theory, including brief coverage of aspects of parsing and language analysis.
Statistical Modeling of Cell Assemblies Activities in Associative Cortex of Behaving Monkeys
So far there has been no general method for relating extracellular electrophysiological measured activity of neurons in the associative cortex to underlying network or "cognitive" states. We propose to model such data using a multivariate Poisson Hidden Markov Model. We demonstrate the application of this approach for temporal segmentation of the firing patterns, and for characterization of the cortical responses to external stimuli. Using such a statistical model we can significantly discriminate two behavioral modes of the monkey, and characterize them by the different firing patterns, as well as by the level of coherency of their multi-unit firing activity. Our study utilized measurements carried out on behaving Rhesus monkeys by M. Abeles, E. Vaadia, and H. Bergman, of the Hadassa Medical School of the Hebrew University. 1 Introduction Hebb hypothesized in 1949 that the basic information processing unit in the cortex is a cell-assembly which may include thousands of cells in a highly interconnected network[l].
Bayesian Learning via Stochastic Dynamics
The attempt to find a single "optimal" weight vector in conventional network training can lead to overfitting and poor generalization. Bayesian methods avoid this, without the need for a validation set, by averaging the outputs of many networks with weights sampled from the posterior distribution given the training data. This sample can be obtained by simulating a stochastic dynamical system that has the posterior as its stationary distribution.
Statistical Modeling of Cell Assemblies Activities in Associative Cortex of Behaving Monkeys
So far there has been no general method for relating extracellular electrophysiological measured activity of neurons in the associative cortex to underlying network or "cognitive" states. We propose to model such data using a multivariate Poisson Hidden Markov Model. We demonstrate the application of this approach for temporal segmentation of the firing patterns, and for characterization of the cortical responses to external stimuli. Using such a statistical model we can significantly discriminate two behavioral modes of the monkey, and characterize them by the different firing patterns, as well as by the level of coherency of their multi-unit firing activity. Our study utilized measurements carried out on behaving Rhesus monkeys by M. Abeles, E. Vaadia, and H. Bergman, of the Hadassa Medical School of the Hebrew University. 1 Introduction Hebb hypothesized in 1949 that the basic information processing unit in the cortex is a cell-assembly which may include thousands of cells in a highly interconnected network[l].
Hidden Markov Models in Molecular Biology: New Algorithms and Applications
Baldi, Pierre, Chauvin, Yves, Hunkapiller, Tim, McClure, Marcella A.
Hidden Markov Models (HMMs) can be applied to several important problems in molecular biology. We introduce a new convergent learning algorithm for HMMs that, unlike the classical Baum-Welch algorithm is smooth and can be applied online or in batch mode, with or without the usual Viterbi most likely path approximation. Left-right HMMs with insertion and deletion states are then trained to represent several protein families including immunoglobulins and kinases. In all cases, the models derived capture all the important statistical properties of the families and can be used efficiently in a number of important tasks such as multiple alignment, motif detection, and classification.
Planar Hidden Markov Modeling: From Speech to Optical Character Recognition
Levin, Esther, Pieraccini, Roberto
We propose in this paper a statistical model (planar hidden Markov model - PHMM) describing statistical properties of images. The model generalizes the single-dimensional HMM, used for speech processing, to the planar case. For this model to be useful an efficient segmentation algorithm, similar to the Viterbi algorithm for HMM, must exist We present conditions in terms of the PHMM parameters that are sufficient to guarantee that the planar segmentation problem can be solved in polynomial time, and describe an algorithm for that. This algorithm aligns optimally the image with the model, and therefore is insensitive to elastic distortions of images. Using this algorithm a joint optima1 segmentation and recognition of the image can be performed, thus overcoming the weakness of traditional OCR systems where segmentation is performed independently before the recognition leading to unrecoverable recognition errors. Tbe PHMM approach was evaluated using a set of isolated band-written digits. An overall digit recognition accuracy of 95% was acbieved. An analysis of the results showed that even in the simple case of recognition of isolated characters, the elimination of elastic distortions enhances the performance Significantly. We expect that the advantage of this approach will be even more significant for tasks such as connected writing recognition/spotting, for whicb there is no known high accuracy method of recognition.
A Hybrid Neural Net System for State-of-the-Art Continuous Speech Recognition
Zavaliagkos, G., Zhao, Y., Schwartz, R., Makhoul, J.
Untill recently, state-of-the-art, large-vocabulary, continuous speech recognition (CSR) has employed Hidden Markov Modeling (HMM) to model speech sounds. In an attempt to improve over HMM we developed a hybrid system that integrates HMM technology with neural networks. We present the concept of a "Segmental Neural Net" (SNN) for phonetic modeling in CSR. By taking into account all the frames of a phonetic segment simultaneously, the SNN overcomes the well-known conditional-independence limitation of HMMs. In several speaker-independent experiments with the DARPA Resource Management corpus, the hybrid system showed a consistent improvement in performance over the baseline HMM system. 1 INTRODUCTION The current state of the art in continuous speech recognition (CSR) is based on the use of hidden Markov models (HMM) to model phonemes in context.