Country
Support Vector Regression Machines
Drucker, Harris, Burges, Christopher J. C., Kaufman, Linda, Smola, Alex J., Vapnik, Vladimir
A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.
Combining Neural Network Regression Estimates with Regularized Linear Weights
Merz, Christopher J., Pazzani, Michael J.
When combining a set of learned models to form an improved estimator, the issue of redundancy or multicollinearity in the set of models must be addressed. A progression of existing approaches and their limitations with respect to the redundancy is discussed. A new approach, PCR *, based on principal components regression is proposed to address these limitations. An evaluation of the new approach on a collection of domains reveals that: 1) PCR* was the most robust combination method as the redundancy of the learned models increased, 2) redundancy could be handled without eliminating any of the learned models, and 3) the principal components of the learned models provided a continuum of "regularized" weights from which PCR * could choose.
For Valid Generalization the Size of the Weights is More Important than the Size of the Network
Baum and Haussler [4] used these results to give sample size bounds for multi-layer threshold networks Generalization and the Size of the Weights in Neural Networks 135 that grow at least as quickly as the number of weights (see also [7]). However, for pattern classification applications the VC-bounds seem loose; neural networks often perform successfully with training sets that are considerably smaller than the number of weights. This paper shows that for classification problems on which neural networks perform well, if the weights are not too big, the size of the weights determines the generalization performance. In contrast with the function classes and algorithms considered in the VC-theory, neural networks used for binary classification problems have real-valued outputs, and learning algorithms typically attempt to minimize the squared error of the network output over a training set. As well as encouraging the correct classification, this tends to push the output away from zero and towards the target values of { -1, I}.
Time Series Prediction using Mixtures of Experts
Zeevi, Assaf J., Meir, Ron, Adler, Robert J.
We consider the problem of prediction of stationary time series, using the architecture known as mixtures of experts (MEM). Here we suggest a mixture which blends several autoregressive models. This study focuses on some theoretical foundations of the prediction problem in this context. More precisely, it is demonstrated that this model is a universal approximator, with respect to learning the unknown prediction function. This statement is strengthened as upper bounds on the mean squared error are established. Based on these results it is possible to compare the MEM to other families of models (e.g., neural networks and state dependent models). It is shown that a degenerate version of the MEM is in fact equivalent to a neural network, and the number of experts in the architecture plays a similar role to the number of hidden units in the latter model.
ARTEX: A Self-organizing Architecture for Classifying Image Regions
Grossberg, Stephen, Williamson, James R.
Automatic processing of visual scenes often begins by detecting regions of an image with common values of simple local features, such as texture, and mapping the pattern offeature activation into a predicted region label. We develop a self-organizing neural architecture, called the ARTEX algorithm, for automatically extracting a novel and effective array of such features and mapping them to output region labels. ARTEX is made up of biologically motivated networks, the Boundary Contour System and Feature Contour System (BCS/FCS) networks for visual feature extraction (Cohen & Grossberg, 1984; Grossberg & Mingolla, 1985a, 1985b; Grossberg & Todorovic, 1988; Grossberg, Mingolla, & Williamson, 1995), and the Gaussian ARTMAP (GAM) network for classification (Williamson, 1996). ARTEX is first evaluated on a difficult real-world task, classifying regions of synthetic aperture radar (SAR) images, where it reliably achieves high resolution (single 874 S. Grossberg and 1. R. Williamson pixel) classification results, and creates accurate probability maps for its class predictions. ARTEX is then evaluated on classification of natural textures, where it outperforms the texture classification system in Greenspan, Goodman, Chellappa, & Anderson (1994) using comparable preprocessing and training conditions. 2 FEATURE EXTRACTION NETWORKS
Effective Training of a Neural Network Character Classifier for Word Recognition
Yaeger, Larry S., Lyon, Richard F., Webb, Brandyn J.
We have been conducting research on bottom-up classification techniques ba;ed on trainable artificial neural networks (ANNs), in combination with comprehensive but weakly-applied language models. To focus our work on a subproblem that is tractable enough to le.:'ld to usable products in a reasonable time, we have restricted the domain to hand-printing, so that strokes are clearly delineated by pen lifts. In the process of optimizing overall performance of the recognizer, we have discovered some useful techniques for architecting and training ANNs that must participate in a larger recognition process. Some of these techniques-especially the normalization of output error, frequency balanCing, and error emphal;is-suggest a common theme of significant value derived by reducing the effect of a priori biases in training data to better represent low frequency, low probability smnples, including second and third choice probabilities. There is mnple prior work in combining low-level classifiers with various search strategies to provide integrated segmentation and recognition for writing (Tappert et al 1990) and speech (Renals et aI1992). And there is a rich background in the use of ANNs a-; classifiers, including their use as a low-level, character classifier in a higher-level word recognition system (Bengio et aI1995).
Learning from Demonstration
By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based reinforcement learning shows significant speedup after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor.
On-line Policy Improvement using Monte-Carlo Search
Tesauro, Gerald, Galperin, Gregory R.
Policy iteration is known to have rapid and robust convergence properties, and for Markov tasks with lookup-table state-space representations, it is guaranteed to convergence to the optimal policy. Online Policy Improvement using Monte-Carlo Search 1069 In typical uses of policy iteration, the policy improvement step is an extensive off-line procedure. For example, in dynamic programming, one performs a sweep through all states in the state space. Reinforcement learning provides another approach to policy improvement; recently, several authors have investigated using RL in conjunction with nonlinear function approximators to represent the value functions and/or policies (Tesauro, 1992; Crites and Barto, 1996; Zhang and Dietterich, 1996). These studies are based on following actual state-space trajectories rather than sweeps through the full state space, but are still too slow to compute improved policies in real time.