Goto

Collaborating Authors

 Technology


Explanation-Based Neural Network Learning for Robot Control

Neural Information Processing Systems

How can artificial neural nets generalize better from fewer examples? In order to generalize successfully, neural network learning methods typically require large training data sets. We introduce a neural network learning method that generalizes rationally from many fewer data points, relying instead on prior knowledge encoded in previously learned neural networks. For example, in robot control learning tasks reported here, previously learned networks that model the effects of robot actions are used to guide subsequent learning of robot control functions. For each observed training example of the target function (e.g. the robot control policy), the learner explains the observed example in terms of its prior knowledge, then analyzes this explanation to infer additional information about the shape, or slope, of the target function. This shape knowledge is used to bias generalization when learning the target function. Results are presented applying this approach to a simulated robot task based on reinforcement learning.


Input Reconstruction Reliability Estimation

Neural Information Processing Systems

This paper describes a technique called Input Reconstruction Reliability Estimation (IRRE) for determining the response reliability of a restricted class of multi-layer perceptrons (MLPs). The technique uses a network's ability to accurately encode the input pattern in its internal representation as a measure of its reliability. The more accurately a network is able to reconstruct the input pattern from its internal representation, the more reliable the network is considered to be. IRRE is provides a good estimate of the reliability of MLPs trained for autonomous driving. Results are presented in which the reliability estimates provided by IRRE are used to select between networks trained for different driving situations. 1 Introduction In many real world domains it is important to know the reliability of a network's response since a single network cannot be expected to accurately handle all the possible inputs.


Feudal Reinforcement Learning

Neural Information Processing Systems

One way to speed up reinforcement learning is to enable learning to happen simultaneously at multiple resolutions in space and time. This paper shows how to create a Q-Iearning managerial hierarchy in which high level managers learn how to set tasks to their submanagers who, in turn, learn how to satisfy them. Sub-managers need not initially understand their managers' commands. They simply learn to maximise their reinforcement in the context of the current command. We illustrate the system using a simple maze task.. As the system learns how to get around, satisfying commands at the multiple levels, it explores more efficiently than standard, flat, Q-Iearning and builds a more comprehensive map. 1 INTRODUCTION Straightforward reinforcement learning has been quite successful at some relatively complex tasks like playing backgammon (Tesauro, 1992).


Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

Neural Information Processing Systems

We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of statespace. We compare Prioritized Sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems. It successfully solves large state-space real time problems with which other methods have difficulty.


Global Regularization of Inverse Kinematics for Redundant Manipulators

Neural Information Processing Systems

When m n, we say that the manipulator has redundant degrees--of -freedom (dot). The inverse kinematics problem is the following: given a desired workspace location x, find joint variables 0 such that f(O) x. Even when the forward kinematics is known, 255 256 DeMers and Kreutz-Delgado the inverse kinematics for a manipulator is not generically solvable in closed form (Craig. 1986).


A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization

Neural Information Processing Systems

A parallel stochastic algorithm is investigated for error-descent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the model-free distributed learning mechanism of Dembo and Kailath. A modified parameter update rule is proposed by which each individual parameter vector perturbation contributes a decrease in error. A substantially faster learning speed is hence allowed. Furthermore, the modified algorithm supports learning time-varying features in dynamical networks. We analyze the convergence and scaling properties of the algorithm, and present simulation results for dynamic trajectory learning in recurrent networks.


Synchronization and Grammatical Inference in an Oscillating Elman Net

Neural Information Processing Systems

We have designed an architecture to span the gap between biophysics and cognitive science to address and explore issues of how a discrete symbol processing system can arise from the continuum, and how complex dynamics like oscillation and synchronization can then be employed in its operation and affect its learning. We show how a discrete-time recurrent "Elman" network architecture can be constructed from recurrently connected oscillatory associative memory modules described by continuous nonlinear ordinary differential equations. The modules can learn connection weights between themselves which will cause the system to evolve under a clocked "machine cycle" by a sequence of transitions of attractors within the modules, much as a digital computer evolves by transitions of its binary flip-flop attractors. The architecture thus employs the principle of "computing with attractors" used by macroscopic systems for reliable computation in the presence of noise. We have specifically constructed a system which functions as a finite state automaton that recognizes or generates the infinite set of six symbol strings that are defined by a Reber grammar. It is a symbol processing system, but with analog input and oscillatory subsymbolic representations. The time steps (machine cycles) of the system are implemented by rhythmic variation (clocking) of a bifurcation parameter. This holds input and "context" modules clamped at their attractors while'hidden and output modules change state, then clamps hidden and output states while context modules are released to load those states as the new context for the next cycle of input. Superior noise immunity has been demonstrated for systems with dynamic attractors over systems with static attractors, and synchronization ("binding") between coupled oscillatory attractors in different modules has been shown to be important for effecting reliable transitions.


Extended Regularization Methods for Nonconvergent Model Selection

Neural Information Processing Systems

Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).


A Note on Learning Vector Quantization

Neural Information Processing Systems

Vector Quantization is useful for data compression. Competitive Learning which minimizes reconstruction error is an appropriate algorithm for vector quantization of unlabelled data. Vector quantization of labelled data for classification has a different objective, to minimize the number of misclassifications, and a different algorithm is appropriate. We show that a variant of Kohonen's LVQ2.1 algorithm can be seen as a multiclass extension of an algorithm which in a restricted 2 class case can be proven to converge to the Bayes optimal classification boundary. We compare the performance of the LVQ2.1 algorithm to that of a modified version having a decreasing window and normalized step size, on a ten class vowel classification problem.


Summed Weight Neuron Perturbation: An O(N) Improvement Over Weight Perturbation

Neural Information Processing Systems

The algorithm presented performs gradient descent on the weight space of an Artificial Neural Network (ANN), using a finite difference to approximate the gradient The method is novel in that it achieves a computational complexity similar to that of Node Perturbation, O(N3), but does not require access to the activity of hidden or internal neurons. This is possible due to a stochastic relation between perturbations at the weights and the neurons of an ANN. The algorithm is also similar to Weight Perturbation in that it is optimal in terms of hardware requirements when used for the training ofVLSI implementations of ANN's.