Country
Best-First Model Merging for Dynamic Learning and Recognition
Stephen M. Omohundro International Computer Science Institute 1947 CenteJ' Street, Suite 600 Berkeley, California 94704 Abstract "Best-first model merging" is a general technique for dynamically choosing the structure of a neural or related architecture while avoiding overfitting.It is applicable to both leaming and recognition tasks and often generalizes significantly better than fixed structures. We demonstrate theapproach applied to the tasks of choosing radial basis functions for function learning, choosing local affine models for curve and constraint surface modelling, and choosing the structure of a balltree or bumptree to maximize efficiency of access. 1 TOWARD MORE COGNITIVE LEARNING Standard backpropagation neural networks learn in a way which appears to be quite different fromhuman leaming. Viewed as a cognitive system, a standard network always maintains acomplete model of its domain. This model is mostly wrong initially, but gets gradually better and better as data appears. The net deals with all data in much the same way and has no representation for the strength of evidence behind a certain conclusion. The network architecture is usually chosen before any data is seen and the processing is much the same in the early phases of learning as in the late phases.
Neural Computing with Small Weights
Siu, Kai-Yeung, Bruck, Jehoshua
Kai-Yeung Siu Dept. of Electrical & Computer Engineering University of California, Irvine Irvine, CA 92717 Jehoshua Bruck IBM Research Division Almaden Research Center San Jose, CA 95120-6099 Abstract An important issue in neural computation is the dynamic range of weights in the neural networks. Many experimental results on learning indicate that the weights in the networks can grow prohibitively large with the size of the inputs. Here we address this issue by studying the tradeoffs between the depth and the size of weights in polynomial-size networks of linear threshold elements (LTEs). We show that there is an efficient way of simulating a network of LTEs with large weights by a network of LTEs with small weights. To prove these results, we use tools from harmonic analysis of Boolean functions.
Some Approximation Properties of Projection Pursuit Learning Networks
Zhao, Ying, Atkeson, Christopher G.
Ying Zhao Christopher G. Atkeson The Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 Abstract This paper will address an important question in machine learning: What kind of network architectures work better on what kind of problems? A projection pursuit learning network has a very similar structure to a one hidden layer sigmoidal neural network. A general method based on a continuous version of projection pursuit regression is developed to show that projection pursuit regression works better on angular smooth functions thanon Laplacian smooth functions. There exists a ridge function approximation scheme to avoid the curse of dimensionality for approximating functionsin L 2(¢d). 1 INTRODUCTION Projection pursuit is a nonparametric statistical technique to find "interesting" low dimensional projections of high dimensional data sets. It has been used for nonparametric fitting and other data-analytic purposes (Friedman and Stuetzle, 1981, Huber, 1985).
Incrementally Learning Time-varying Half-planes
Kuh, Anthony, Petsche, Thomas, Rivest, Ronald L.
For a dichotomy, concept drift means that the classification function changes over time. We want to extend the theoretical analyses of learning to include time-varying concepts; to explore the behavior of current learning algorithms in the face of concept drift; and to devise tracking algorithms to better handle concept drift. In this paper, we briefly describe our theoretical model and then present the results of simulations *kuh@wiliki.eng.hawaii.edu
Tangent Prop - A formalism for specifying selected invariances in an adaptive network
Simard, Patrice, Victorri, Bernard, LeCun, Yann, Denker, John
In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired behavior ofthe system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spatial distortionsof the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the derivative ofits outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform. 1 INTRODUCTION In machine learning, one very often knows more about the function to be learned than just the training data. An interesting case is when certain directional derivatives ofthe desired function are known at certain points.