Goto

Collaborating Authors

 Statistical Learning


An Analog VLSI Splining Network

Neural Information Processing Systems

We have produced a VLSI circuit capable of learning to approximate arbitrary smooth of a single variable using a technique closely related to splines. The circuit effectively has 512 knots space on a uniform grid and has full support for learning. The circuit also can be used to approximate multi-variable functions as sum of splines. An interesting, and as of yet, nearly untapped set of applications for VLSI implementation of neural network learning systems can be found in adaptive control and nonlinear signal processing. In most such applications, the learning task consists of approximating a real function of a small number of continuous variables from discrete data points.


Kohonen Networks and Clustering: Comparative Performance in Color Clustering

Neural Information Processing Systems

"vector quantization", and "unsupervised learning" are all words which descn'be the same process: assigning a few exemplars to represent a large set of samples. Perfonning that process is the subject of a substantial body of literature. In this paper, we are concerned with the comparison of various clustering techniques to a particular, practical application: color clustering. The color clustering problem is as follows: an image is recorded in full color -- that is, three components, RED, GREEN, and BLUE, each of which has been measured to 8 bits of precision. Thus, each pixel is a 24 bit quantity. We must find a representation in which 2563 possible colors are represented by only 8 bits per pixel. That is, for a problem with 256000 variables (512 x 512) variables, assign each variable to one of only 256 classes. The color clustering problem is currently of major economic interest since millions of display systems are sold each year which can only store 8 bits per pixel, but on which users would like to be able to display "true" color (or at least as near true color as possible). In this study, we have approached the problem using the standard techniques from the literature (including k-means -- ISODATA clustering[1,3,61, LBG[4]), competitive learning (referred to as CL herein) [2], and Kohonen feature maps [5,7,9].


A Comparative Study of the Practical Characteristics of Neural Network and Conventional Pattern Classifiers

Neural Information Processing Systems

Seven different neural network and conventional pattern classifiers were compared using artificial and speech recognition tasks. High order polynomial GMDH classifiers typically provided intermediate error rates and often required long training times and large amounts of memory. In addition, the decision regions formed did not generalize well to regions of the input space with little training data. Radial basis function classifiers generalized well in high dimensional spaces, and provided low error rates with training times that were much less than those of back-propagation classifiers (Lee and Lippmann, 1989). Gaussian mixture classifiers provided good performance when the numbers and types of mixtures were selected carefully to model class densities well. Linear tree classifiers were the most computationally ef- 976 Ng and Lippmann ficient but performed poorly with high dimensionality inputs and when the number of training patterns was small. KD-tree classifiers reduced classification time by a factor of four over conventional KNN classifiers for low 2-input dimension problems. They provided little or no reduction in classification time for high 22-input dimension problems. Improved condensed KNN classifiers reduced memory requirements over conventional KNN classifiers by a factor of two to fifteen for all problems, without increasing the error rate significantly.


Comparison of three classification techniques: CART, C4.5 and Multi-Layer Perceptrons

Neural Information Processing Systems

In this paper, after some introductory remarks into the classification problem as considered in various research communities, and some discussions concerning some of the reasons for ascertaining the performances of the three chosen algorithms, viz., CART (Classification and Regression Tree), C4.5 (one of the more recent versions of a popular induction tree technique known as ID3), and a multi-layer perceptron (MLP), it is proposed to compare the performances of these algorithms under two criteria: classification and generalisation. It is found that, in general, the MLP has better classification and generalisation accuracies compared with the other two algorithms. 1 Introduction Classification of data into categories has been pursued by a number of research communities, viz., applied statistics, knowledge acquisition, neural networks. In applied statistics, there are a number of techniques, e.g., clustering algorithms (see e.g., Hartigan), CART (Classification and Regression Trees, see e.g., Breiman et al). Clustering algorithms are used when the underlying data naturally fall into a number of groups, the distance among groups are measured by various metrics [Hartigan]. CART [Breiman, et all has been very popular among applied statisticians. It assumes that the underlying data can be separated into categories, the decision boundaries can either be parallel to the axis or they can be a linear combination of these axes!. Under certain assumptions on the input data and their associated lIn CART, and C4.5, the axes are the same as the input features


Asymptotic slowing down of the nearest-neighbor classifier

Neural Information Processing Systems

M2/n' for sufficiently large values of M. Here, Poo(error) denotes the probability of error in the infinite sample limit, and is at most twice the error of a Bayes classifier. Although the value of the coefficient a depends upon the underlying probability distributions, the exponent of M is largely distribution free. We thus obtain a concise relation between a classifier's ability to generalize from a finite reference sample and the dimensionality of the feature space, as well as an analytic validation of Bellman's well known "curse of dimensionality." 1 INTRODUCTION One of the primary tasks assigned to neural networks is pattern classification. Common applications include recognition problems dealing with speech, handwritten characters, DNA sequences, military targets, and (in this conference) sexual identity. Two fundamental concepts associated with pattern classification are generalization (how well does a classifier respond to input data it has never encountered before?) and scalability (how are a classifier's processing and training requirements affected by increasing the number of features that describe the input patterns?).


Note on Learning Rate Schedules for Stochastic Optimization

Neural Information Processing Systems

We present and compare learning rate schedules for stochastic gradient descent, a general algorithm which includes LMS, online backpropagation and k-means clustering as special cases. We introduce "search-thenconverge" type schedules which outperform the classical constant and "running average" (1ft) schedules both in speed of convergence and quality of solution.


Using Genetic Algorithms to Improve Pattern Classification Performance

Neural Information Processing Systems

Feature selection and creation are two of the most important and difficult tasks in the field of pattern classification. Good features improve the performance of both conventional and neural network pattern classifiers. Exemplar selection is another task that can reduce the memory and computation requirements of a KNN classifier.


A Framework for the Cooperation of Learning Algorithms

Neural Information Processing Systems

We introduce a framework for training architectures composed of several modules. This framework, which uses a statistical formulation of learning systems, provides a unique formalism for describing many classical connectionist algorithms as well as complex systems where several algorithms interact. It allows to design hybrid systems which combine the advantages of connectionist algorithms as well as other learning algorithms.


Oriented Non-Radial Basis Functions for Image Coding and Analysis

Neural Information Processing Systems

We introduce oriented non-radial basis function networks (ONRBF) as a generalization of Radial Basis Function networks (RBF)- wherein the Euclidean distance metric in the exponent of the Gaussian is replaced by a more general polynomial. This permits the definition of more general regions and in particular-hyper-ellipses with orientations. In the case of hyper-surface estimation this scheme requires a smaller number of hidden units and alleviates the "curse of dimensionality" associated kernel type approximators.In the case of an image, the hidden units correspond to features in the image and the parameters associated with each unit correspond to the rotation, scaling and translation properties of that particular "feature". In the context of the ONBF scheme, this means that an image can be represented by a small number of features. Since, transformation of an image by rotation, scaling and translation correspond to identical transformations of the individual features, the ONBF scheme can be used to considerable advantage for the purposes of image recognition and analysis.