Statistical Learning
Leaning by Combining Memorization and Gradient Descent
We have created a radial basis function network that allocates a new computational unit whenever an unusual pattern is presented to the network. The network learns by allocating new units and adjusting the parameters of existing units. If the network performs poorly on a presented pattern, then a new unit is allocated which memorizes the response to the presented pattern. If the network performs well on a presented pattern, then the network parameters are updated using standard LMS gradient descent. For predicting the Mackey Glass chaotic time series, our network learns much faster than do those using back-propagation and uses a comparable number of synapses.
Generalization Properties of Radial Basis Functions
Botros, Sherif M., Atkeson, Christopher G.
Sherif M. Botros Christopher G. Atkeson Brain and Cognitive Sciences Department and the Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We examine the ability of radial basis functions (RBFs) to generalize. We compare the performance of several types of RBFs. We use the inverse dynamics of an idealized two-joint arm as a test case. We find that without a proper choice of a norm for the inputs, RBFs have poor generalization properties. A simple global scaling of the input variables greatly improves performance.
Speech Recognition Using Connectionist Approaches
This paper is a summary of SPRINT project aims and results. The project focus on the use of neuro-computing techniques to tackle various problems that remain unsolved in speech recognition. First results concern the use of feedforward nets for phonetic units classification, isolated word recognition, and speaker adaptation.
An Analog VLSI Splining Network
Schwartz, Daniel B., Samalam, Vijay K.
We have produced a VLSI circuit capable of learning to approximate arbitrary smooth of a single variable using a technique closely related to splines. The circuit effectively has 512 knots space on a uniform grid and has full support for learning. The circuit also can be used to approximate multi-variable functions as sum of splines. An interesting, and as of yet, nearly untapped set of applications for VLSI implementation of neural network learning systems can be found in adaptive control and nonlinear signal processing. In most such applications, the learning task consists of approximating a real function of a small number of continuous variables from discrete data points.
Kohonen Networks and Clustering: Comparative Performance in Color Clustering
Snyder, Wesley, Nissman, Daniel, Bout, David Van den, Bilbro, Griff
"vector quantization", and "unsupervised learning" are all words which descn'be the same process: assigning a few exemplars to represent a large set of samples. Perfonning that process is the subject of a substantial body of literature. In this paper, we are concerned with the comparison of various clustering techniques to a particular, practical application: color clustering. The color clustering problem is as follows: an image is recorded in full color -- that is, three components, RED, GREEN, and BLUE, each of which has been measured to 8 bits of precision. Thus, each pixel is a 24 bit quantity. We must find a representation in which 2563 possible colors are represented by only 8 bits per pixel. That is, for a problem with 256000 variables (512 x 512) variables, assign each variable to one of only 256 classes. The color clustering problem is currently of major economic interest since millions of display systems are sold each year which can only store 8 bits per pixel, but on which users would like to be able to display "true" color (or at least as near true color as possible). In this study, we have approached the problem using the standard techniques from the literature (including k-means -- ISODATA clustering[1,3,61, LBG[4]), competitive learning (referred to as CL herein) [2], and Kohonen feature maps [5,7,9].
A Comparative Study of the Practical Characteristics of Neural Network and Conventional Pattern Classifiers
Ng, Kenney, Lippmann, Richard P.
Seven different neural network and conventional pattern classifiers were compared using artificial and speech recognition tasks. High order polynomial GMDH classifiers typically provided intermediate error rates and often required long training times and large amounts of memory. In addition, the decision regions formed did not generalize well to regions of the input space with little training data. Radial basis function classifiers generalized well in high dimensional spaces, and provided low error rates with training times that were much less than those of back-propagation classifiers (Lee and Lippmann, 1989). Gaussian mixture classifiers provided good performance when the numbers and types of mixtures were selected carefully to model class densities well. Linear tree classifiers were the most computationally ef- 976 Ng and Lippmann ficient but performed poorly with high dimensionality inputs and when the number of training patterns was small. KD-tree classifiers reduced classification time by a factor of four over conventional KNN classifiers for low 2-input dimension problems. They provided little or no reduction in classification time for high 22-input dimension problems. Improved condensed KNN classifiers reduced memory requirements over conventional KNN classifiers by a factor of two to fifteen for all problems, without increasing the error rate significantly.
Comparison of three classification techniques: CART, C4.5 and Multi-Layer Perceptrons
In this paper, after some introductory remarks into the classification problem as considered in various research communities, and some discussions concerning some of the reasons for ascertaining the performances of the three chosen algorithms, viz., CART (Classification and Regression Tree), C4.5 (one of the more recent versions of a popular induction tree technique known as ID3), and a multi-layer perceptron (MLP), it is proposed to compare the performances of these algorithms under two criteria: classification and generalisation. It is found that, in general, the MLP has better classification and generalisation accuracies compared with the other two algorithms. 1 Introduction Classification of data into categories has been pursued by a number of research communities, viz., applied statistics, knowledge acquisition, neural networks. In applied statistics, there are a number of techniques, e.g., clustering algorithms (see e.g., Hartigan), CART (Classification and Regression Trees, see e.g., Breiman et al). Clustering algorithms are used when the underlying data naturally fall into a number of groups, the distance among groups are measured by various metrics [Hartigan]. CART [Breiman, et all has been very popular among applied statisticians. It assumes that the underlying data can be separated into categories, the decision boundaries can either be parallel to the axis or they can be a linear combination of these axes!. Under certain assumptions on the input data and their associated lIn CART, and C4.5, the axes are the same as the input features
Asymptotic slowing down of the nearest-neighbor classifier
Snapp, Robert R., Psaltis, Demetri, Venkatesh, Santosh S.
M2/n' for sufficiently large values of M. Here, Poo(error) denotes the probability of error in the infinite sample limit, and is at most twice the error of a Bayes classifier. Although the value of the coefficient a depends upon the underlying probability distributions, the exponent of M is largely distribution free. We thus obtain a concise relation between a classifier's ability to generalize from a finite reference sample and the dimensionality of the feature space, as well as an analytic validation of Bellman's well known "curse of dimensionality." 1 INTRODUCTION One of the primary tasks assigned to neural networks is pattern classification. Common applications include recognition problems dealing with speech, handwritten characters, DNA sequences, military targets, and (in this conference) sexual identity. Two fundamental concepts associated with pattern classification are generalization (how well does a classifier respond to input data it has never encountered before?) and scalability (how are a classifier's processing and training requirements affected by increasing the number of features that describe the input patterns?).