Statistical Learning
Generalization Properties of Radial Basis Functions
Botros, Sherif M., Atkeson, Christopher G.
Atkeson Brain and Cognitive Sciences Department and the Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We examine the ability of radial basis functions (RBFs) to generalize. We compare the performance of several types of RBFs. We use the inverse dynamics ofan idealized two-joint arm as a test case. We find that without a proper choice of a norm for the inputs, RBFs have poor generalization properties. A simple global scaling of the input variables greatly improves performance.
An Analog VLSI Splining Network
Schwartz, Daniel B., Samalam, Vijay K.
Waltham, MA 02254 Abstract We have produced a VLSI circuit capable of learning to approximate arbitrary smoothof a single variable using a technique closely related to splines. The circuit effectively has 512 knots space on a uniform grid and has full support for learning. The circuit also can be used to approximate multi-variable functions as sum of splines. An interesting, and as of yet, nearly untapped set of applications for VLSI implementation ofneural network learning systems can be found in adaptive control and nonlinear signal processing. In most such applications, the learning task consists of approximating a real function of a small number of continuous variables from discrete data points.
Comparison of three classification techniques: CART, C4.5 and Multi-Layer Perceptrons
In this paper, after some introductory remarks into the classification problem asconsidered in various research communities, and some discussions concerning some of the reasons for ascertaining the performances of the three chosen algorithms, viz., CART (Classification and Regression Tree), C4.5 (one of the more recent versions of a popular induction tree technique knownas ID3), and a multi-layer perceptron (MLP), it is proposed to compare the performances of these algorithms under two criteria: classification andgeneralisation. It is found that, in general, the MLP has better classification and generalisation accuracies compared with the other two algorithms. 1 Introduction Classification of data into categories has been pursued by a number of research communities, viz., applied statistics, knowledge acquisition, neural networks. In applied statistics, there are a number of techniques, e.g., clustering algorithms (see e.g., Hartigan), CART (Classification and Regression Trees, see e.g., Breiman et al). Clustering algorithms are used when the underlying data naturally fall into a number of groups, the distance among groups are measured by various metrics [Hartigan]. CART[Breiman, et all has been very popular among applied statisticians. It assumes that the underlying data can be separated into categories, the decision boundaries can either be parallel to the axis or they can be a linear combination of these axes!.
Asymptotic slowing down of the nearest-neighbor classifier
Snapp, Robert R., Psaltis, Demetri, Venkatesh, Santosh S.
Santosh S. Venkatesh Electrical Engineering University of Pennsylvania Philadelphia, PA 19104 If patterns are drawn from an n-dimensional feature space according to a probability distribution that obeys a weak smoothness criterion, we show that the probability that a random input pattern is misclassified by a nearest-neighbor classifier using M random reference patterns asymptotically satisfies a PM(error) "" Poo(error) M2/n' for sufficiently large values of M. Here, Poo(error) denotes the probability of error in the infinite sample limit, and is at most twice the error of a Bayes classifier. Although the value of the coefficient a depends upon the underlying probability distributions, the exponent of M is largely distribution free.We thus obtain a concise relation between a classifier's ability to generalize from a finite reference sample and the dimensionality of the feature space, as well as an analytic validation of Bellman's well known "curse of dimensionality." 1 INTRODUCTION One of the primary tasks assigned to neural networks is pattern classification.
Note on Learning Rate Schedules for Stochastic Optimization
Darken, Christian, Moody, John E.
We present and compare learning rate schedules for stochastic gradient descent, a general algorithm which includes LMS, online backpropagation andk-means clustering as special cases. We introduce "search-thenconverge" typeschedules which outperform the classical constant and "running average" (1ft) schedules both in speed of convergence and quality of solution.
Using Genetic Algorithms to Improve Pattern Classification Performance
Chang, Eric I., Lippmann, Richard P.
Feature selection and creation are two of the most important and difficult tasks in the field of pattern classification. Good features improve the performance of both conventional and neural network pattern classifiers. Exemplar selection is another task that can reduce the memory and computation requirements of a KNN classifier. These three tasks require a search through a space which is typically so large that 797 798 Chang and Lippmann exhaustive search is impractical. The purpose of this research was to explore the usefulness of Genetic search algorithms for these tasks. Details concerning this research are available in (Chang, 1990).
A Framework for the Cooperation of Learning Algorithms
Bottou, Léon, Gallinari, Patrick
We introduce a framework for training architectures composed of several modules. This framework, which uses a statistical formulation of learning systems, provides a unique formalism for describing many classical connectionist algorithms as well as complex systems where several algorithms interact. It allows to design hybrid systems which combine the advantages of connectionist algorithms as well as other learning algorithms.
Computing with Arrays of Bell-Shaped and Sigmoid Functions
Bell-shaped response curves are commonly found in biological neurons whenever a natural metric exist on the corresponding relevant stimulus variable (orientation, position in space, frequency, time delay, ...). As a result, they are often used in neural models in different context ranging from resolution enhancement and interpolation tolearning (see, for instance, Baldi et al. (1988), Moody et al. (1989) *and Division of Biology, California Institute of Technology. The complete title of this paper should read: "Computing with arrays of bell-shaped and sigmoid functions.