Goto

Collaborating Authors

 Statistical Learning


Fast Non-Linear Dimension Reduction

Neural Information Processing Systems

Dimension reduction provides compact representations for storage, transmission, and classification. Dimension reduction algorithms operate by identifying and eliminating statistical redundancies in the data. The optimal linear technique for dimension reduction is principal component analysis (PCA).


Unsupervised Parallel Feature Extraction from First Principles

Neural Information Processing Systems

We describe a number of learning rules that can be used to train unsupervised parallel feature extraction systems. The learning rules are derived using gradient ascent of a quality function. We consider a number of quality functions that are rational functions of higher order moments of the extracted feature values. We show that one system learns the principle components of the correlation matrix. Principal component analysis systems are usually not optimal feature extractors for classification.


Supervised learning from incomplete data via an EM approach

Neural Information Processing Systems

Real-world learning tasks may involve high-dimensional data sets with arbitrary patterns of missing data. In this paper we present a framework based on maximum likelihood density estimation for learning from such data set.s. VVe use mixture models for the density estimates and make two distinct appeals to the Expectation Maximization (EM) principle (Dempster et al., 1977) in deriving a learning algorithm-EM is used both for the estimation of mixture components and for coping wit.h missing dat.a. The resulting algorithm is applicable t.o a wide range of supervised as well as unsupervised learning problems.


Learning Classification with Unlabeled Data

Neural Information Processing Systems

We represent objects with n-dimensional pattern vectors and consider piecewise-linear classifiers consisting of a collection of (labeled) codebook vectors in the space of the input patterns (See Figure 1). The classification boundaries are gi ven by the voronoi tessellation of the codebook vectors. Patterns are said to belong to the class (given by the label) of the codebook vector to which they are closest.


Central and Pairwise Data Clustering by Competitive Neural Networks

Neural Information Processing Systems

Data clustering amounts to a combinatorial optimization problem to reduce the complexity of a data representation and to increase its precision. Central and pairwise data clustering are studied in the maximum entropy framework. For central clustering we derive a set of reestimation equations and a minimization procedure which yields an optimal number of clusters, their centers and their cluster probabilities. A meanfield approximation for pairwise clustering is used to estimate assignment probabilities. A se1fconsistent solution to multidimensional scaling and pairwise clustering is derived which yields an optimal embedding and clustering of data points in a d-dimensional Euclidian space. 1 Introduction A central problem in information processing is the reduction of the data complexity with minimal loss in precision to discard noise and to reveal basic structure of data sets. Data clustering addresses this tradeoff by optimizing a cost function which preserves the original data as complete as possible and which simultaneously favors prototypes with minimal complexity (Linde et aI., 1980; Gray, 1984; Chou et aI., 1989; Rose et ai., 1990). We discuss an objective function for the joint optimization of distortion errors and the complexity of a reduced data representation. A maximum entropy estimation of the cluster assignments yields a unifying framework for clustering algorithms with a number of different distortion and complexity measures. The close analogy of complexity optimized clustering with winner-take-all neural networks suggests a neural-like implementation resembling topological feature maps (see Figure 1).


Credit Assignment through Time: Alternatives to Backpropagation

Neural Information Processing Systems

Learning to recognize or predict sequences using long-term context has many applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior to that obtained with backpropagation. 1 Introduction Recurrent neural networks have been considered to learn to map input sequences to output sequences. Machines that could efficiently learn such tasks would be useful for many applications involving sequence prediction, recognition or production. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. In fact, we can prove that dynamical systems such as recurrent neural networks will be increasingly difficult to train with gradient descent as the duration of the dependencies to be captured increases. A mathematical analysis of the problem shows that either one of two conditions arises in such systems.




Fast Pruning Using Principal Components

Neural Information Processing Systems

The assumption is that there exists an underlying (possibly noisy) functional relationship relating the outputs to the inputs y /(u,e) where e denotes the noise. The aim of the learning process is to approximate this relationship based on the the training set.


Memory-Based Methods for Regression and Classification

Neural Information Processing Systems

Memory-based learning methods operate by storing all (or most) of the training data and deferring analysis of that data until "run time" (i.e., when a query is presented and a decision or prediction must be made). When a query is received, these methods generally answer the query by retrieving and analyzing a small subset of the training data-namely, data in the immediate neighborhood of the query point. In short, memory-based methods are "lazy" (they wait until the query) and "local" (they use only a local neighborhood). The purpose of this workshop was to review the state-of-the-art in memory-based methods and to understand their relationship to "eager" and "global" learning algorithms such as batch backpropagation. There are two essential components to any memory-based algorithm: the method for defining the "local neighborhood" and the learning method that is applied to the training examples in the local neighborhood.