Goto

Collaborating Authors

 Uncertainty


Deterministic Annealing Variant of the EM Algorithm

Neural Information Processing Systems

We present a deterministic annealing variant of the EM algorithm for maximum likelihood parameter estimation problems. In our approach, the EM process is reformulated as the problem of minimizing the thermodynamic free energy by using the principle of maximum entropy and statistical mechanics analogy. Unlike simulated annealing approaches, this minimization is deterministically performed. Moreover, the derived algorithm, unlike the conventional EM algorithm, can obtain better estimates free of the initial parameter values.


Bayesian Query Construction for Neural Network Models

Neural Information Processing Systems

If data collection is costly, there is much to be gained by actively selecting particularly informative data points in a sequential way. In a Bayesian decision-theoretic framework we develop a query selection criterion which explicitly takes into account the intended use of the model predictions. By Markov Chain Monte Carlo methods the necessary quantities can be approximated to a desired precision. As the number of data points grows, the model complexity is modified by a Bayesian model selection strategy. The properties of two versions of the criterion ate demonstrated in numerical experiments.


Deterministic Annealing Variant of the EM Algorithm

Neural Information Processing Systems

We present a deterministic annealing variant of the EM algorithm maximum likelihood parameter estimation problems. In ourfor approach, the EM process is reformulated as the problem of minimizing the thermodynamic free energy by using the principle of maximum entropy and statistical mechanics analogy. Unlike simulated deterministicallyannealing approaches, this minimization is performed. Moreover, the derived algorithm, unlike the conventional better estimates free of the initialEM algorithm, can obtain parameter values.


Estimating Conditional Probability Densities for Periodic Variables

Neural Information Processing Systems

In this paper we introduce three novel techniques for tackling such problems, and investigate their performance using syntheticdata. We then apply these techniques to the problem of extracting the distribution of wind vector directions from radar scatterometer data gathered by a remote-sensing satellite.


An Alternative Model for Mixtures of Experts

Neural Information Processing Systems

Hinton Dept. of Computer Science University of Toronto Toronto, M5S lA4, Canada Abstract We propose an alternative model for mixtures of experts which uses a different parametric form for the gating network. The modified model is trained by the EM algorithm. In comparison with earlier models-trained by either EM or gradient ascent-there is no need to select a learning stepsize. We report simulation experiments which show that the new architecture yields faster convergence. We also apply the new model to two problem domains: piecewise nonlinear function approximation and the combination of multiple previously trained classifiers. 1 INTRODUCTION For the mixtures of experts architecture (Jacobs, Jordan, Nowlan & Hinton, 1991), the EM algorithm decouples the learning process in a manner that fits well with the modular structure and yields a considerably improved rate of convergence (Jordan & Jacobs, 1994).


A Mixture Model System for Medical and Machine Diagnosis

Neural Information Processing Systems

Diagnosis of human disease or machine fault is a missing data problem since many variables are initially unknown. Additional information needs to be obtained. The j oint probability distribution of the data can be used to solve this problem. We model this with mixture models whose parameters are estimated by the EM algorithm. This gives the benefit that missing data in the database itself can also be handled correctly. The request for new information to refine the diagnosis is performed using the maximum utility principle. Since the system is based on learning it is domain independent and less labor intensive than expert systems or probabilistic networks. An example using a heart disease database is presented.


Inferring Ground Truth from Subjective Labelling of Venus Images

Neural Information Processing Systems

In practical situations, experts may visually examine the images and provide a subjective noisy estimate of the truth. Calibrating the reliability and bias of expert labellers is a nontrivial problem. In this paper we discuss some of our recent work on this topic in the context of detecting small volcanoes in Magellan SAR images of Venus. Empirical results (using the Expectation-Maximization procedure) suggest that accounting for subjective noise can be quite significant interms of quantifying both human and algorithm detection performance.


Classifying with Gaussian Mixtures and Clusters

Neural Information Processing Systems

In this paper, we derive classifiers which are winner-take-all (WTA) approximations to a Bayes classifier with Gaussian mixtures for class conditional densities. The derived classifiers include clustering based algorithms like LVQ and k-Means. We propose a constrained rank Gaussian mixtures model and derive a WTA algorithm for it. Our experiments with two speech classification tasks indicate that the constrained rank model and the WTA approximations improve the performance over the unconstrained models. 1 Introduction A classifier assigns vectors from Rn (n dimensional feature space) to one of K classes, partitioning the feature space into a set of K disjoint regions. A Bayesian classifier builds the partition based on a model of the class conditional probability densities of the inputs (the partition is optimal for the given model).


Active Learning for Function Approximation

Neural Information Processing Systems

We develop a principled strategy to sample a function optimally for function approximation tasks within a Bayesian framework. Using ideas from optimal experiment design, we introduce an objective function (incorporating both bias and variance) to measure the degree ofapproximation, and the potential utility of the data points towards optimizing this objective. We show how the general strategy canbe used to derive precise algorithms to select data for two cases: learning unit step functions and polynomial functions. In particular, we investigate whether such active algorithms can learn the target with fewer examples. We obtain theoretical and empirical resultsto suggest that this is the case. 1 INTRODUCTION AND MOTIVATION Learning from examples is a common supervised learning paradigm that hypothesizes atarget concept given a stream of training examples that describes the concept. In function approximation, example-based learning can be formulated as synthesizing anapproximation function for data sampled from an unknown target function (Poggio and Girosi, 1990). Active learning describes a class of example-based learning paradigms that seeks out new training examples from specific regions of the input space, instead of passively accepting examples from some data generating source. By judiciously selecting ex- 594 KahKay Sung, Parlha Niyogi amples instead of allowing for possible random sampling, active learning techniques can conceivably have faster learning rates and better approximation results than passive learning methods. This paper presents a Bayesian formulation for active learning within the function approximation framework.


Bayesian Query Construction for Neural Network Models

Neural Information Processing Systems

If data collection is costly, there is much to be gained by actively selecting particularlyinformative data points in a sequential way. In a Bayesian decision-theoretic framework we develop a query selection criterionwhich explicitly takes into account the intended use of the model predictions. By Markov Chain Monte Carlo methods the necessary quantities can be approximated to a desired precision. Asthe number of data points grows, the model complexity is modified by a Bayesian model selection strategy. The properties oftwo versions of the criterion ate demonstrated in numerical experiments.