Goto

Collaborating Authors

 Learning Graphical Models


Bayesian Learning via Stochastic Dynamics

Neural Information Processing Systems

The attempt to find a single "optimal" weight vector in conventional network training can lead to overfitting and poor generalization. Bayesian methods avoid this, without the need for a validation set, by averaging the outputs of many networks with weights sampled from the posterior distribution given the training data. This sample can be obtained by simulating a stochastic dynamical system that has the posterior as its stationary distribution.


The Computation of Stereo Disparity for Transparent and for Opaque Surfaces

Neural Information Processing Systems

The classical computational model for stereo vision incorporates a uniqueness inhibition constraint to enforce a one-to-one feature match, thereby sacrificing the ability to handle transparency. Critics of the model disregard the uniqueness constraint and argue that the smoothness constraint can provide the excitation support required for transparency computation. However, this modification fails in neighborhoods with sparse features. We propose a Bayesian approach to stereo vision with priors favoring cohesive over transparent surfaces. The disparity and its segmentation into a multi-layer "depth planes" representation are simultaneously computed. The smoothness constraint propagates support within each layer, providing mutual excitation for non-neighboring transparent or partially occluded regions. Test results for various random-dot and other stereograms are presented.


Time Warping Invariant Neural Networks

Neural Information Processing Systems

We proposed a model of Time Warping Invariant Neural Networks (TWINN) to handle the time warped continuous signals. Although TWINN is a simple modification of well known recurrent neural network, analysis has shown that TWINN completely removes time warping and is able to handle difficult classification problem. It is also shown that TWINN has certain advantages over the current available sequential processing schemes: Dynamic Programming(DP)[I], Hidden Markov Model( HMM)[2], Time Delayed Neural Networks(TDNN) [3] and Neural Network Finite Automata(NNFA)[4]. We also analyzed the time continuity employed in TWINN and pointed out that this kind of structure can memorize longer input history compared with Neural Network Finite Automata (NNFA). This may help to understand the well accepted fact that for learning grammatical reference with NNF A one had to start with very short strings in training set. The numerical example we used is a trajectory classification problem. This problem, making a feature of variable sampling rates, having internal states, continuous dynamics, heavily time-warped data and deformed phase space trajectories, is shown to be difficult to other schemes. With TWINN this problem has been learned in 100 iterations. For benchmark we also trained the exact same problem with TDNN and completely failed as expected.


Directional-Unit Boltzmann Machines

Neural Information Processing Systems

University of Toronto University of Toronto University of Colorado Toronto, ONT M5S lA4 Toronto, ONT M5S lA4 Boulder, CO 80309-0430 Abstract We present a general formulation for a network of stochastic directional units. This formulation is an extension of the Boltzmann machine in which the units are not binary, but take on values in a cyclic range, between 0 and 271' radians. The conditional distribution of a unit's stochastic state is a circular version of the Gaussian probability distribution, known as the von Mises distribution. This combination of a value and a certainty provides additional representational power in a unit. Many kinds of information can naturally be represented in terms of angular, or directional, variables.


Hidden Markov Model Induction by Bayesian Model Merging

Neural Information Processing Systems

This paper describes a technique for learning both the number of states and the topology of Hidden Markov Models from examples. The induction process starts with the most specific model consistent with the training data and generalizes by successively merging states. Both the choice of states to merge and the stopping criterion are guided by the Bayesian posterior probability. We compare our algorithm with the Baum-Welch method of estimating fixed-size models, and find that it can induce minimal HMMs from data in cases where fixed estimation does not converge or requires redundant parameters to converge. 1 INTRODUCTION AND OVERVIEW Hidden Markov Models (HMMs) are a well-studied approach to the modelling of sequence data. HMMs can be viewed as a stochastic generalization of finite-state automata, where both the transitions between states and the generation of output symbols are governed by probability distributions. HMMs have been important in speech recognition (Rabiner & Juang, 1986), cryptography, and more recently in other areas such as protein classification and alignment (Haussler, Krogh, Mian & SjOlander, 1992; Baldi, Chauvin, Hunkapiller & McClure, 1993). Practitioners have typically chosen the HMM topology by hand, so that learning the HMM from sample data means estimating only a fixed number of model parameters. The standard approach is to find a maximum likelihood (ML) or maximum a posteriori probability (MAP) estimate of the HMM parameters.


Bayesian Learning via Stochastic Dynamics

Neural Information Processing Systems

The attempt to find a single "optimal" weight vector in conventional networktraining can lead to overfitting and poor generalization. Bayesian methods avoid this, without the need for a validation set, by averaging the outputs of many networks with weights sampled from the posterior distribution given the training data. This sample can be obtained by simulating a stochastic dynamical system that has the posterior as its stationary distribution.


A Hybrid Neural Net System for State-of-the-Art Continuous Speech Recognition

Neural Information Processing Systems

Untill recently, state-of-the-art, large-vocabulary, continuous speech recognition (CSR) has employed Hidden Markov Modeling (HMM) to model speech sounds. In an attempt to improve over HMM we developed a hybrid system that integrates HMM technology with neural We present the concept of a "Segmental Neural Net"networks.


Information, Prediction, and Query by Committee

Neural Information Processing Systems

We analyze the "query by committee" algorithm, a method for filtering informativequeries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gainwith positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of thresholded smooth functions.



Hidden Markov Models in Molecular Biology: New Algorithms and Applications

Neural Information Processing Systems

Hidden Markov Models (HMMs) can be applied to several important problemsin molecular biology. We introduce a new convergent learning algorithm for HMMs that, unlike the classical Baum-Welch algorithm is smooth and can be applied online or in batch mode, with or without the usual Viterbi most likely path approximation. Left-right HMMs with insertion and deletion states are then trained to represent several protein families including immunoglobulins and kinases. In all cases, the models derived capture all the important statistical properties of the families and can be used efficiently in a number of important tasks such as multiple alignment, motif detection, andclassification.