Goto

Collaborating Authors

 Learning Graphical Models


Modeling Acoustic Correlations by Factor Analysis

Neural Information Processing Systems

Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime propertiesof speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure ofhigh dimensional data. These parameters are estimated by an Expectation-Maximization (EM) algorithm that can be embedded inthe training procedures for HMMs.


Bayesian Robustification for Audio Visual Fusion

Neural Information Processing Systems

Department of Cognitive Science University of California, San Diego La Jolla, CA 92092-0515 Abstract We discuss the problem of catastrophic fusion in multimodal recognition systems.This problem arises in systems that need to fuse different channels in non-stationary environments. Practice shows that when recognition modules within each modality are tested in contexts inconsistent with their assumptions, their influence on the fused product tends to increase, with catastrophic results. We explore aprincipled solution to this problem based upon Bayesian ideas of competitive models and inference robustification: each sensory channel is provided with simple white-noise context models, andthe perceptual hypothesis and context are jointly estimated. Consequently,context deviations are interpreted as changes in white noise contamination strength, automatically adjusting the influence of the module. The approach is tested on a fixed lexicon automatic audiovisual speech recognition problem with very good results. 1 Introduction In this paper we address the problem of catastrophic fusion in automatic multimodal recognition systems.


Analysis of Drifting Dynamics with Neural Network Hidden Markov Models

Neural Information Processing Systems

We present a method for the analysis of nonstationary time series withmultiple operating modes. In particular, it is possible to detect and to model both a switching of the dynamics and a less abrupt, time consuming drift from one mode to another. This is achieved in two steps. First, an unsupervised training method provides predictionexperts for the inherent dynamical modes. Then, the trained experts are used in a hidden Markov model that allows to model drifts. An application to physiological wake/sleep data demonstrates that analysis and modeling of real-world time series can be improved when the drift paradigm is taken into account.


Graph Matching with Hierarchical Discrete Relaxation

Neural Information Processing Systems

Our aim in this paper is to develop a Bayesian framework for matching hierarchicalrelational models. The goal is to make discrete label assignments so as to optimise a global cost function that draws information concerning the consistency of match from different levels ofthe hierarchy.


An Incremental Nearest Neighbor Algorithm with Queries

Neural Information Processing Systems

We consider the general problem of learning multi-category classification fromlabeled examples. We present experimental results for a nearest neighbor algorithm which actively selects samples from different pattern classes according to a querying rule instead of the a priori class probabilities. The amount of improvement of this query-based approach over the passive batch approach depends on the complexity of the Bayes rule. The principle on which this algorithm isbased is general enough to be used in any learning algorithm which permits a model-selection criterion and for which the error rate of the classifier is calculable in terms of the complexity of the model. 1 INTRODUCTION We consider the general problem of learning multi-category classification from labeled examples.In many practical learning settings the time or sample size available for training are limited. This may have adverse effects on the accuracy of the resulting classifier.For instance, in learning to recognize handwritten characters typical time limitation confines the training sample size to be of the order of a few hundred examples. It is important to make learning more efficient by obtaining only training data which contains significant information about the separability of the pattern classes thereby letting the learning algorithm participate actively in the sampling process. Querying for the class labels of specificly selected examples in the input space may lead to significant improvements in the generalization error (cf.


Learning Path Distributions Using Nonequilibrium Diffusion Networks

Neural Information Processing Systems

Department of Mathematics University of California, San Diego La Jolla, CA 92093-0112 Abstract We propose diffusion networks, a type of recurrent neural network with probabilistic dynamics, as models for learning natural signals that are continuous in time and space. We give a formula for the gradient of the log-likelihood of a path with respect to the drift parameters for a diffusion network. This gradient can be used to optimize diffusion networks in the nonequilibrium regime for a wide variety of problems paralleling techniques which have succeeded in engineering fields such as system identification, state estimation and signal filtering. An aspect of this work which is of particular interestto computational neuroscience and hardware design is that with a suitable choice of activation function, e.g., quasi-linear sigmoidal, the gradient formula is local in space and time. 1 Introduction Many natural signals, like pixel gray-levels, line orientations, object position, velocity andshape parameters, are well described as continuous-time continuous-valued stochastic processes; however, the neural network literature has seldom explored the continuous stochastic case. Since the solutions to many decision theoretic problems of interest are naturally formulated using probability distributions, it is desirable to have a flexible framework for approximating probability distributions on continuous pathspaces.


Estimating Dependency Structure as a Hidden Variable

Neural Information Processing Systems

This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors.


An Application of Reversible-Jump MCMC to Multivariate Spherical Gaussian Mixtures

Neural Information Processing Systems

Applications of Gaussian mixture models occur frequently in the fields of statistics and artificial neural networks. One of the key issues arising from any mixture model application is how to estimate theoptimum number of mixture components. This paper extends the Reversible-Jump Markov Chain Monte Carlo (MCMC) algorithm to the case of multivariate spherical Gaussian mixtures using a hierarchical prior model. Using this method the number of mixture components is no longer fixed but becomes a parameter ofthe model which we shall estimate. The Reversible-Jump MCMC algorithm is capable of moving between parameter subspaces whichcorrespond to models with different numbers of mixture components. As a result a sample from the full joint distribution of all unknown model parameters is generated. The technique is then demonstrated on a simulated example and a well known vowel dataset. 1 Introduction Applications of Gaussian mixture models regularly appear in the neural networks literature. One of their most common roles in the field of neural networks, is in the placement of centres in a radial basis function network.


Hierarchical Non-linear Factor Analysis and Topographic Maps

Neural Information Processing Systems

We first describe a hierarchical, generative model that can be viewed as a nonlinear generalisation of factor analysis and can be implemented in a neural network. The model performs perceptual inferencein a probabilistically consistent manner by using top-down, bottom-up and lateral connections. These connections can be learned using simple rules that require only locally available information.We then show how to incorporate lateral connections intothe generative model. The model extracts a sparse, distributed, hierarchical representation of depth from simplified random-dot stereograms and the localised disparity detectors in the first hidden layer form a topographic map. When presented with image patches from natural scenes, the model develops topographically organisedlocal feature detectors.


A Revolution: Belief Propagation in Graphs with Cycles

Neural Information Processing Systems

Department of Physics, Cavendish Laboratory Cambridge University Abstract Until recently, artificial intelligence researchers have frowned upon the application of probability propagation in Bayesian belief networks thathave cycles. The probability propagation algorithm is only exact in networks that are cycle-free. However, it has recently been discovered that the two best error-correcting decoding algorithms areactually performing probability propagation in belief networks with cycles. 1 Communicating over a noisy channel Our increasingly wired world demands efficient methods for communicating bits of information over physical channels that introduce errors. Examples of real-world channels include twisted-pair telephone wires, shielded cable-TV wire, fiberoptic cable, deep-space radio, terrestrial radio, and indoor radio. Engineers attempt to correct the errors introduced by the noise in these channels through the use of channel coding which adds protection to the information source, so that some channel errors can be corrected.