Statistical Learning
Data Analysis using G/SPLINES
G/SPLINES is an algorithm for building functional models of data. It uses genetic search to discover combinations of basis functions which are then used to build a least-squares regression model. Because it produces a population of models which evolve over time rather than a single model, it allows analysis not possible with other regression-based approaches. 1 INTRODUCTION G/SPLINES is a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm (Friedman, 1990) with Holland's Genetic Algorithm (Holland, 1975). G/SPLINES has advantages over MARS in that it requires fewer least-squares computations, is easily extendable to non-spline basis functions, may discover models inaccessible to local-variable selection algorithms, and allows significantly larger problems to be considered. These issues are discussed in (Rogers, 1991). This paper begins with a discussion of linear regression models, followed by a description of the G/SPLINES algorithm, and finishes with a series of experiments illustrating its performance, robustness, and analysis capabilities.
Node Splitting: A Constructive Algorithm for Feed-Forward Neural Networks
The small network forms an approximate model of a set of training data, and the split creates a larger more powerful network which is initialised with the approximate solution already found. The insufficiency of the smaller network in modelling the system which generated the data leads to oscillation in those hidden nodes whose weight vectors cover regions inthe input space where more detail is required in the model. These nodes are identified and split in two using principal component analysis, allowing the new nodes t.o cover the two main modes of each oscillating vector. Nodes are selected for splitting using principal component analysis on the oscillating weight vectors, or by examining the Hessian matrix of second derivatives of the network error with respect to the weight.s.
Best-First Model Merging for Dynamic Learning and Recognition
Stephen M. Omohundro International Computer Science Institute 1947 CenteJ' Street, Suite 600 Berkeley, California 94704 Abstract "Best-first model merging" is a general technique for dynamically choosing the structure of a neural or related architecture while avoiding overfitting.It is applicable to both leaming and recognition tasks and often generalizes significantly better than fixed structures. We demonstrate theapproach applied to the tasks of choosing radial basis functions for function learning, choosing local affine models for curve and constraint surface modelling, and choosing the structure of a balltree or bumptree to maximize efficiency of access. 1 TOWARD MORE COGNITIVE LEARNING Standard backpropagation neural networks learn in a way which appears to be quite different fromhuman leaming. Viewed as a cognitive system, a standard network always maintains acomplete model of its domain. This model is mostly wrong initially, but gets gradually better and better as data appears. The net deals with all data in much the same way and has no representation for the strength of evidence behind a certain conclusion. The network architecture is usually chosen before any data is seen and the processing is much the same in the early phases of learning as in the late phases.
Principles of Risk Minimization for Learning Theory
Learning is posed as a problem of function estimation, for which two principles ofsolution are considered: empirical risk minimization and structural risk minimization. These two principles are applied to two different statements ofthe function estimation problem: global and local. Systematic improvements in prediction power are illustrated in application to zip-code recognition.
Neural Network Analysis of Event Related Potentials and Electroencephalogram Predicts Vigilance
Venturini, Rita, Lytton, William W., Sejnowski, Terrence J.
Automated monitoring of vigilance in attention intensive tasks such as air traffic control or sonar operation is highly desirable. As the operator monitorsthe instrument, the instrument would monitor the operator, insuring against lapses. We have taken a first step toward this goal by using feedforwardneural networks trained with backpropagation to interpret event related potentials (ERPs) and electroencephalogram (EEG) associated withperiods of high and low vigilance. The accuracy of our system on an ERP data set averaged over 28 minutes was 96%, better than the 83% accuracy obtained using linear discriminant analysis. Practical vigilance monitoring will require prediction over shorter time periods. We were able to average the ERP over as little as 2 minutes and still get 90% correct prediction of a vigilance measure. Additionally, we achieved similarly good performance using segments of EEG power spectrum as short as 56 sec.
Knowledge Discovery in Databases: An Overview
Frawley, William J., Piatetsky-Shapiro, Gregory, Matheus, Christopher J.
After a decade of fundamental interdisciplinary research in machine learning, the spadework in this field has been done; the 1990s should see the widespread exploitation of knowledge discovery as an aid to assembling knowledge bases. The contributors to the AAAI Press book Knowledge Discovery in Databases were excited at the potential benefits of this research. The editors hope that some of this excitement will communicate itself to "AI Magazine readers of this article.
A training algorithm for optimal margin classifiers
Boser, B. | Guyon, I. | Vapnik, V. N.
A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary.