Goto

Collaborating Authors

 Statistical Learning


Unsupervised Classification of 3D Objects from 2D Views

Neural Information Processing Systems

Satoshi Suzuki Hiroshi Ando ATR Human Information Processing Research Laboratories 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, Japan satoshi@hip.atr.co.jp, ando@hip.atr.co.jp Abstract This paper presents an unsupervised learning scheme for categorizing 3D objects from their 2D projected images. The scheme exploits an auto-associative network's ability to encode each view of a single object into a representation that indicates its view direction. We propose two models that employ different classification mechanisms; the first model selects an auto-associative network whose recovered view best matches the input view, and the second model is based on a modular architecture whose additional network classifies the views by splitting the input space nonlinearly. We demonstrate the effectiveness of the proposed classification models through simulations using 3D wire-frame objects. 1 INTRODUCTION The human visual system can recognize various 3D (three-dimensional) objects from their 2D (two-dimensional) retinal images although the images vary significantly as the viewpoint changes. Recent computational models have explored how to learn to recognize 3D objects from their projected views (Poggio & Edelman, 1990). Most existing models are, however, based on supervised learning, i.e., during training the teacher tells which object each view belongs to.


A Mixture Model System for Medical and Machine Diagnosis

Neural Information Processing Systems

Diagnosis of human disease or machine fault is a missing data problem since many variables are initially unknown. Additional information needs to be obtained. The j oint probability distribution of the data can be used to solve this problem. We model this with mixture models whose parameters are estimated by the EM algorithm. This gives the benefit that missing data in the database itself can also be handled correctly. The request for new information to refine the diagnosis is performed using the maximum utility principle. Since the system is based on learning it is domain independent and less labor intensive than expert systems or probabilistic networks. An example using a heart disease database is presented.


Predicting the Risk of Complications in Coronary Artery Bypass Operations using Neural Networks

Neural Information Processing Systems

MLP networks provided slightly better risk prediction than conventional logistic regression when used to predict the risk of death, stroke, and renal failure on 1257 patients who underwent coronaryartery bypass operations. Bootstrap sampling was required to compare approaches and regularization provided by early stopping was an important component of improved performance. A simplified approach to generating confidence intervals for MLP risk predictions using an auxiliary "confidence MLP" was also developed. The confidence MLP is trained to reproduce the confidence bounds that were generated during training by 50 MLP networks trained using bootstrap samples. Current research is validating these results usinglarger data sets, exploring approaches to detect outlier patients who are so different fromany training patient that accurate risk prediction is suspect, developing approaches toexplaining which input features are important for an individual patient, and determining why MLP networks provide improved performance.


Recognizing Handwritten Digits Using Mixtures of Linear Models

Neural Information Processing Systems

We construct a mixture of locally linear generative models of a collection ofpixel-based images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their log-likelihoods under each model. We use an EMbased algorithm in which the M-step is computationally straightforward principal components analysis (PCA). Incorporating tangent-plane information [12]about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.


Connectionist Speaker Normalization with Generalized Resource Allocating Networks

Neural Information Processing Systems

The paper presents a rapid speaker-normalization technique based on neural network spectral mapping. The neural network is used as a front-end of a continuous speech recognition system (speakerdependent, HMM-based)to normalize the input acoustic data from a new speaker. The spectral difference between speakers can be reduced using a limited amount of new acoustic data (40 phonetically richsentences). Recognition error of phone units from the acoustic-phonetic continuous speech corpus APASCI is decreased with an adaptability ratio of 25%. We used local basis networks of elliptical Gaussian kernels, with recursive allocation of units and online optimization of parameters (GRAN model). For this application, themodel included a linear term. The results compare favorably with multivariate linear mapping based on constrained orthonormal transformations.


Non-linear Prediction of Acoustic Vectors Using Hierarchical Mixtures of Experts

Neural Information Processing Systems

We are concerned in this paper with the application of multiple models, specifically theHierarchical Mixtures of Experts, to time series prediction, specifically the problem of predicting acoustic vectors for use in speech coding. There have been a number of applications of multiple models in time series prediction. A classic example is the Threshold Autoregressive model (TAR) which was used by Tong & 836 S.R. Waterhouse, A. J. Robinson Lim (1980) to predict sunspot activity. More recently, Lewis, Kay and Stevens (in Weigend & Gershenfeld (1994)) describe the use of Multivariate and Regression Splines(MARS) to the prediction of future values of currency exchange rates. Finally, in speech prediction, Cuperman & Gersho (1985) describe the Switched Inter-frame Vector Prediction (SIVP) method which switches between separate linear predictorstrained on different statistical classes of speech.



Efficient Methods for Dealing with Missing Data in Supervised Learning

Neural Information Processing Systems

Palo Alto, CA 94304 Abstract We present efficient algorithms for dealing with the problem of missing inputs(incomplete feature vectors) during training and recall. Our approach is based on the approximation of the input data distribution usingParzen windows. For recall, we obtain closed form solutions for arbitrary feedforward networks. For training, we show how the backpropagation step for an incomplete pattern can be approximated by a weighted averaged backpropagation step. The complexity of the solutions for training and recall is independent of the number of missing features.


Classifying with Gaussian Mixtures and Clusters

Neural Information Processing Systems

In this paper, we derive classifiers which are winner-take-all (WTA) approximations to a Bayes classifier with Gaussian mixtures for class conditional densities. The derived classifiers include clustering based algorithms like LVQ and k-Means. We propose a constrained rank Gaussian mixtures model and derive a WTA algorithm for it. Our experiments with two speech classification tasks indicate that the constrained rank model and the WTA approximations improve the performance over the unconstrained models. 1 Introduction A classifier assigns vectors from Rn (n dimensional feature space) to one of K classes, partitioning the feature space into a set of K disjoint regions. A Bayesian classifier builds the partition based on a model of the class conditional probability densities of the inputs (the partition is optimal for the given model).


Convergence Properties of the K-Means Algorithms

Neural Information Processing Systems

K-Means is a popular clustering algorithm used in many applications, including the initialization of more computationally expensive algorithms (Gaussian mixtures, Radial Basis Functions, Learning Vector Quantization and some Hidden Markov Models). The practice of this initialization procedure often gives the frustrating feeling that K-Means performs most of the task in a small fraction of the overall time. This motivated us to better understand this convergence speed. A second reason lies in the traditional debate between hard threshold (e.g.