Learning Graphical Models
Inferring Ground Truth from Subjective Labelling of Venus Images
Smyth, Padhraic, Fayyad, Usama M., Burl, Michael C., Perona, Pietro, Baldi, Pierre
Instead of "ground truth" one may only have the subjective opinion(s) of one or more experts. For example, medical data or image data may be collected off-line and some time later a set of experts analyze the data and produce a set of class labels. The central problem is that of trying to infer the "ground truth" given the noisy subjective estimates of the experts. When one wishes to apply a supervised learning algorithm to the data, the problem is primarily twofold: (i) how to evaluate the relative performance of experts and algorithms, and (ii) how to train a pattern recognition system in the absence of absolute ground truth. In this paper we focus on problem (i), namely the performance evaluation issue, and in particular we discuss the application of a particular modelling technique to the problem of counting volcanoes on the surface of Venus.
A Mixture Model System for Medical and Machine Diagnosis
Stensmo, Magnus, Sejnowski, Terrence J.
Diagnosis of human disease or machine fault is a missing data problem since many variables are initially unknown. Additional information needs to be obtained. The j oint probability distribution of the data can be used to solve this problem. We model this with mixture models whose parameters are estimated by the EM algorithm. This gives the benefit that missing data in the database itself can also be handled correctly. The request for new information to refine the diagnosis is performed using the maximum utility principle. Since the system is based on learning it is domain independent and less labor intensive than expert systems or probabilistic networks. An example using a heart disease database is presented.
Estimating Conditional Probability Densities for Periodic Variables
Bishop, Chris M., Legleye, Claire
Many applications of neural networks can be formulated in terms of a multivariate nonlinear mapping from an input vector x to a target vector t. A conventional neural network approach, based on least squares for example, leads to a network mapping which approximates the regression of t on x. A more complete description of the data can be obtained by estimating the conditional probability density of t, conditioned on x, which we write as p(tlx). Various techniques exist for modelling such densities when the target variables live in a Euclidean space. However, a number of potential applications involve angle-like output variables which are periodic on some finite interval (usually chosen to be (0,271")).
Factorial Learning and the EM Algorithm
Many real world learning problems are best characterized by an interaction of multiple independent causes or factors. Discovering such causal structure from the data is the focus of this paper. Based on Zemel and Hinton's cooperative vector quantizer (CVQ) architecture, an unsupervised learning algorithm is derived from the Expectation-Maximization (EM) framework. Due to the combinatorial nature of the data generation process, the exact E-step is computationally intractable. Two alternative methods for computing the E-step are proposed: Gibbs sampling and mean-field approximation, and some promising empirical results are presented.
Active Learning for Function Approximation
We develop a principled strategy to sample a function optimally for function approximation tasks within a Bayesian framework. Using ideas from optimal experiment design, we introduce an objective function (incorporating both bias and variance) to measure the degree of approximation, and the potential utility of the data points towards optimizing this objective. We show how the general strategy can be used to derive precise algorithms to select data for two cases: learning unit step functions and polynomial functions. In particular, we investigate whether such active algorithms can learn the target with fewer examples. We obtain theoretical and empirical results to suggest that this is the case. 1 INTRODUCTION AND MOTIVATION Learning from examples is a common supervised learning paradigm that hypothesizes a target concept given a stream of training examples that describes the concept. In function approximation, example-based learning can be formulated as synthesizing an approximation function for data sampled from an unknown target function (Poggio and Girosi, 1990). Active learning describes a class of example-based learning paradigms that seeks out new training examples from specific regions of the input space, instead of passively accepting examples from some data generating source.
Convergence Properties of the K-Means Algorithms
K-Means is a popular clustering algorithm used in many applications, including the initialization of more computationally expensive algorithms (Gaussian mixtures, Radial Basis Functions, Learning Vector Quantization and some Hidden Markov Models). The practice of this initialization procedure often gives the frustrating feeling that K-Means performs most of the task in a small fraction of the overall time. This motivated us to better understand this convergence speed. A second reason lies in the traditional debate between hard threshold (e.g.
Interior Point Implementations of Alternating Minimization Training
Lemmon, Michael, Szymanski, Peter T.
AM techniques were first introduced in soft-competitive learning algorithms[l]. This training procedure was later shown to be closely related to Expectation-Maximization algorithms used by the statistical estimation community[2]. Alternating minimizations search for optimal network weights by breaking the search into two distinct minimization problems. A given network performance functional is extremalized first with respect to one set of network weights and then with respect to the remaining weights. These learning procedures have found applications in the training of local expert systems [3], and in Boltzmann machine training [4]. More recently, convergence rates have been derived by viewing the AM 570 Michael Lemmon.
Diffusion of Credit in Markovian Models
Bengio, Yoshua, Frasconi, Paolo
This paper studies the problem of diffusion in Markovian models, such as hidden Markov models (HMMs) and how it makes very difficult the task of learning of long-term dependencies in sequences. Using results from Markov chain theory, we show that the problem of diffusion is reduced if the transition probabilities approach 0 or 1. Under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations.
Deterministic Annealing Variant of the EM Algorithm
We present a deterministic annealing variant of the EM algorithm for maximum likelihood parameter estimation problems. In our approach, the EM process is reformulated as the problem of minimizing the thermodynamic free energy by using the principle of maximum entropy and statistical mechanics analogy. Unlike simulated annealing approaches, this minimization is deterministically performed. Moreover, the derived algorithm, unlike the conventional EM algorithm, can obtain better estimates free of the initial parameter values.
Learning Local Error Bars for Nonlinear Regression
Nix, David A., Weigend, Andreas S.
We present a new method for obtaining local error bars for nonlinear regression, i.e., estimates of the confidence in predicted values that depend on the input. We approach this problem by applying a maximumlikelihood framework to an assumed distribution of errors. We demonstrate our method first on computer-generated data with locally varying, normally distributed target noise. We then apply it to laser data from the Santa Fe Time Series Competition where the underlying system noise is known quantization error and the error bars give local estimates of model misspecification. In both cases, the method also provides a weightedregression effect that improves generalization performance.