Goto

Collaborating Authors

 Statistical Learning


Learning Prototype Models for Tangent Distance

Neural Information Processing Systems

Local algorithms such as K-nearest neighbor (NN) perform well in pattern recognition, even though they often assume the simplest distance on the pattern space. It has recently been shown (Simard et al. 1993) that the performance can be further improved by incorporating invariance to specific transformations in the underlying distance metric - the so called tangent distance. The resulting classifier, however, can be prohibitively slow and memory intensive due to the large amount of prototypes that need to be stored and used in the distance comparisons. In this paper we address this problem for the tangent distance algorithm, by developing rich models for representing large subsets of the prototypes. Our leading example of prototype model is a low-dimensional (12) hyperplane defined by a point and a set of basis or tangent vectors.


Recognizing Handwritten Digits Using Mixtures of Linear Models

Neural Information Processing Systems

We construct a mixture of locally linear generative models of a collection of pixel-based images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their log-likelihoods under each model. We use an EMbased algorithm in which the M-step is computationally straightforward principal components analysis (PCA). Incorporating tangent-plane information [12] about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.



Bayesian Query Construction for Neural Network Models

Neural Information Processing Systems

If data collection is costly, there is much to be gained by actively selecting particularly informative data points in a sequential way. In a Bayesian decision-theoretic framework we develop a query selection criterion which explicitly takes into account the intended use of the model predictions. By Markov Chain Monte Carlo methods the necessary quantities can be approximated to a desired precision. As the number of data points grows, the model complexity is modified by a Bayesian model selection strategy. The properties of two versions of the criterion ate demonstrated in numerical experiments.


Unsupervised Classification of 3D Objects from 2D Views

Neural Information Processing Systems

The human visual system can recognize various 3D (three-dimensional) objects from their 2D (two-dimensional) retinal images although the images vary significantly as the viewpoint changes. Recent computational models have explored how to learn to recognize 3D objects from their projected views (Poggio & Edelman, 1990). Most existing models are, however, based on supervised learning, i.e., during training the teacher tells which object each view belongs to. The model proposed by Weinshall et al. (1990) also requires a signal that segregates different objects during training. This paper, on the other hand, discusses unsupervised aspects of 3D object recognition where the system discovers categories by itself.


Factorial Learning by Clustering Features

Neural Information Processing Systems

We introduce a novel algorithm for factorial learning, motivated by segmentation problems in computational vision, in which the underlying factors correspond to clusters of highly correlated input features. The algorithm derives from a new kind of competitive clustering model, in which the cluster generators compete to explain each feature of the data set and cooperate to explain each input example, rather than competing for examples and cooperating on features, as in traditional clustering algorithms. A natural extension of the algorithm recovers hierarchical models of data generated from multiple unknown categories, each with a different, multiple causal structure. Several simulations demonstrate the power of this approach.


A Rapid Graph-based Method for Arbitrary Transformation-Invariant Pattern Classification

Neural Information Processing Systems

We present a graph-based method for rapid, accurate search through prototypes for transformation-invariant pattern classification. Our method has in theory the same recognition accuracy as other recent methods based on ''tangent distance" [Simard et al., 1994], since it uses the same categorization rule. Nevertheless ours is significantly faster during classification because far fewer tangent distances need be computed. Crucial to the success of our system are 1) a novel graph architecture in which transformation constraints and geometric relationships among prototypes are encoded during learning, and 2) an improved graph search criterion, used during classification. These architectural insights are applicable to a wide range of problem domains. Here we demonstrate that on a handwriting recognition task, a basic implementation of our system requires less than half the computation of the Euclidean sorting method. 1 INTRODUCTION In recent years, the crucial issue of incorporating invariances into networks for pattern recognition has received increased attention, most especially due to the work of 666 Alessandro Sperduti, David G. Stork


Learning Local Error Bars for Nonlinear Regression

Neural Information Processing Systems

We present a new method for obtaining local error bars for nonlinear regression, i.e., estimates of the confidence in predicted values that depend on the input. We approach this problem by applying a maximumlikelihood framework to an assumed distribution of errors. We demonstrate our method first on computer-generated data with locally varying, normally distributed target noise. We then apply it to laser data from the Santa Fe Time Series Competition where the underlying system noise is known quantization error and the error bars give local estimates of model misspecification. In both cases, the method also provides a weightedregression effect that improves generalization performance.


Predicting the Risk of Complications in Coronary Artery Bypass Operations using Neural Networks

Neural Information Processing Systems

MLP networks provided slightly better risk prediction than conventional logistic regression when used to predict the risk of death, stroke, and renal failure on 1257 patients who underwent coronary artery bypass operations. Bootstrap sampling was required to compare approaches and regularization provided by early stopping was an important component of improved performance. A simplified approach to generating confidence intervals for MLP risk predictions using an auxiliary "confidence MLP" was also developed. The confidence MLP is trained to reproduce the confidence bounds that were generated during training by 50 MLP networks trained using bootstrap samples. Current research is validating these results using larger data sets, exploring approaches to detect outlier patients who are so different from any training patient that accurate risk prediction is suspect, developing approaches to explaining which input features are important for an individual patient, and determining why MLP networks provide improved performance.


A Comparison of Discrete-Time Operator Models for Nonlinear System Identification

Neural Information Processing Systems

We present a unifying view of discrete-time operator models used in the context of finite word length linear signal processing. Comparisons are made between the recently presented gamma operator model, and the delta and rho operator models for performing nonlinear system identification and prediction using neural networks. A new model based on an adaptive bilinear transformation which generalizes all of the above models is presented.