Goto

Collaborating Authors

 Learning Graphical Models


Planar Hidden Markov Modeling: From Speech to Optical Character Recognition

Neural Information Processing Systems

We propose in this paper a statistical model (planar hidden Markov model - PHMM) describing statistical properties of images. The model generalizes the single-dimensional HMM, used for speech processing, to the planar case. For this model to be useful an efficient segmentation algorithm, similar to the Viterbi algorithm for HMM, must exist We present conditions in terms of the PHMM parameters that are sufficient to guarantee that the planar segmentation problem can be solved in polynomial time, and describe an algorithm for that. This algorithm aligns optimally the image with the model, and therefore is insensitive to elastic distortions of images. Using this algorithm a joint optima1 segmentation and recognition of the image can be performed, thus overcoming the weakness of traditional OCR systems where segmentation is performed independently before the recognition leading to unrecoverable recognition errors. Tbe PHMM approach was evaluated using a set of isolated band-written digits. An overall digit recognition accuracy of 95% was acbieved. An analysis of the results showed that even in the simple case of recognition of isolated characters, the elimination of elastic distortions enhances the performance Significantly. We expect that the advantage of this approach will be even more significant for tasks such as connected writing recognition/spotting, for whicb there is no known high accuracy method of recognition.


Transient Signal Detection with Neural Networks: The Search for the Desired Signal

Neural Information Processing Systems

Matched filtering has been one of the most powerful techniques employed for transient detection. Here we will show that a dynamic neural network outperforms the conventional approach. When the artificial neural network (ANN) is trained with supervised learning schemes there is a need to supply the desired signal for all time, although we are only interested in detecting the transient. In this paper we also show the effects on the detection agreement of different strategies to construct the desired signal. The extension of the Bayes decision rule (011 desired signal), optimal in static classification, performs worse than desired signals constructed by random noise or prediction during the background. 1 INTRODUCTION Detection of poorly defined waveshapes in a nonstationary high noise background is an important and difficult problem in signal processing.


Modeling Consistency in a Speaker Independent Continuous Speech Recognition System

Neural Information Processing Systems

We would like to incorporate speaker-dependent consistencies, such as gender, in an otherwise speaker-independent speech recognition system. In this paper we discuss a Gender Dependent Neural Network (GDNN) which can be tuned for each gender, while sharing most of the speaker independent parameters. We use a classification network to help generate gender-dependent phonetic probabilities for a statistical (HMM) recognition system.The gender classification net predicts the gender with high accuracy, 98.3% on a Resource Management test set. However, the integration ofthe GDNN into our hybrid HMM-neural network recognizer provided an improvement in the recognition score that is not statistically significant on a Resource Management test set.


A Hybrid Linear/Nonlinear Approach to Channel Equalization Problems

Neural Information Processing Systems

Channel equalization problem is an important problem in high-speed communications. The sequences of symbols transmitted are distorted by neighboring symbols. Traditionally, the channel equalization problem is considered as a channel-inversion operation. One problem of this approach is that there is no direct correspondence between error probability andresidual error produced by the channel inversion operation. In this paper, the optimal equalizer design is formulated as a classification problem. The optimal classifier can be constructed by Bayes decision rule. In general it is nonlinear. An efficient hybrid linear/nonlinear equalizer approach has been proposed to train the equalizer. The error probability of new linear/nonlinear equalizer has been shown to be better thana linear equalizer in an experimental channel. 1 INTRODUCTION



History-Dependent Attractor Neural Networks

Neural Information Processing Systems

We present a methodological framework enabling a detailed description ofthe performance of Hopfield-like attractor neural networks (ANN) in the first two iterations. Using the Bayesian approach, wefind that performance is improved when a history-based term is included in the neuron's dynamics. A further enhancement of the network's performance is achieved by judiciously choosing the censored neurons (those which become active in a given iteration) onthe basis of the magnitude of their post-synaptic potentials. Thecontribution of biologically plausible, censored, historydependent dynamicsis especially marked in conditions of low firing activity and sparse connectivity, two important characteristics of the mammalian cortex. In such networks, the performance attained ishigher than the performance of two'independent' iterations, whichrepresents an upper bound on the performance of history-independent networks.


On the Use of Evidence in Neural Networks

Neural Information Processing Systems

The Bayesian "evidence" approximation has recently been employed to determine the noise and weight-penalty terms used in back-propagation. This paper shows that for neural nets it is far easier to use the exact result than it is to use the evidence approximation. Moreover, unlike the evidence approximation,the exact result neither has to be re-calculated for every new data set, nor requires the running of computer code (the exact result is closed form). In addition, it turns out that the evidence procedure's MAPestimate for neural nets is, in toto, approximation error. Another advantage of the exact analysis is that it does not lead one to incorrect intuition, like the claim that using evidence one can "evaluate different priors in light of the data". This paper also discusses sufficiency conditions for the evidence approximation to hold, why it can sometimes give "reasonable" results, etc.


Time Warping Invariant Neural Networks

Neural Information Processing Systems

We proposed a model of Time Warping Invariant Neural Networks (TWINN) to handle the time warped continuous signals. Although TWINN is a simple modification ofwell known recurrent neural network, analysis has shown that TWINN completely removestime warping and is able to handle difficult classification problem. It is also shown that TWINN has certain advantages over the current available sequential processing schemes: Dynamic Programming(DP)[I], Hidden Markov Model( HMM)[2], Time Delayed Neural Networks(TDNN) [3] and Neural Network Finite Automata(NNFA)[4]. Wealso analyzed the time continuity employed in TWINN and pointed out that this kind of structure can memorize longer input history compared with Neural Network FiniteAutomata (NNFA). This may help to understand the well accepted fact that for learning grammatical reference with NNFA one had to start with very short strings in training set. The numerical example we used is a trajectory classification problem. This problem, making a feature of variable sampling rates, having internal states, continuous dynamics,heavily time-warped data and deformed phase space trajectories, is shown to be difficult to other schemes. With TWINN this problem has been learned in 100 iterations. For benchmark we also trained the exact same problem with TDNN and completely failed as expected.


Directional-Unit Boltzmann Machines

Neural Information Processing Systems

University of Colorado Boulder, CO 80309-0430 Abstract We present a general formulation for a network of stochastic directional units.This formulation is an extension of the Boltzmann machine in which the units are not binary, but take on values in a cyclic range, between 0 and 271' radians. The conditional distribution of a unit's stochastic state is a circular version of the Gaussian probability distribution, known as the von Mises distribution. This combination of a value and a certainty provides additional representational powerin a unit. Many kinds of information can naturally be represented in terms of angular, or directional, variables. A circular range forms a suitable representation for explicitly directional information, such as wind direction, as well as for information where the underlying range is periodic, such as days of the week or months of the year.


Hidden Markov Model Induction by Bayesian Model Merging

Neural Information Processing Systems

This paper describes a technique for learning both the number of states and the topology of Hidden Markov Models from examples. The induction process starts with the most specific model consistent with the training data and generalizes by successively merging states. Both the choice of states to merge and the stopping criterion are guided by the Bayesian posterior probability. We compare our algorithm with the Baum-Welch method of estimating fixed-size models, and find that it can induce minimal HMMs from data in cases where fixed estimation does not converge or requires redundant parameters to converge. 1 INTRODUCTION AND OVERVIEW Hidden Markov Models (HMMs) are a well-studied approach to the modelling of sequence data. HMMs can be viewed as a stochastic generalization of finite-state automata, where both the transitions between states and the generation of output symbols are governed by probability distributions.