Europe
Feudal Reinforcement Learning
Dayan, Peter, Hinton, Geoffrey E.
One way to speed up reinforcement learning is to enable learning to happen simultaneously at multiple resolutions in space and time. This paper shows how to create a Q-Iearning managerial hierarchy in which high level managers learn how to set tasks to their submanagers who, in turn, learn how to satisfy them. Sub-managers need not initially understand their managers' commands. They simply learn to maximise their reinforcement in the context of the current command. We illustrate the system using a simple maze task.. As the system learns how to get around, satisfying commands at the multiple levels, it explores more efficiently than standard, flat, Q-Iearning and builds a more comprehensive map. 1 INTRODUCTION Straightforward reinforcement learning has been quite successful at some relatively complex tasks like playing backgammon (Tesauro, 1992).
Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping
Moore, Andrew W., Atkeson, Christopher G.
We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of statespace. We compare Prioritized Sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems. It successfully solves large state-space real time problems with which other methods have difficulty.
Synchronization and Grammatical Inference in an Oscillating Elman Net
Baird, Bill, Troyer, Todd, Eeckman, Frank
We have designed an architecture to span the gap between biophysics and cognitive science to address and explore issues of how a discrete symbol processing system can arise from the continuum, and how complex dynamics like oscillation and synchronization can then be employed in its operation and affect its learning. We show how a discrete-time recurrent "Elman" network architecture can be constructed from recurrently connected oscillatory associative memory modules described by continuous nonlinear ordinary differential equations. The modules can learn connection weights between themselves which will cause the system to evolve under a clocked "machine cycle" by a sequence of transitions of attractors within the modules, much as a digital computer evolves by transitions of its binary flip-flop attractors. The architecture thus employs the principle of "computing with attractors" used by macroscopic systems for reliable computation in the presence of noise. We have specifically constructed a system which functions as a finite state automaton that recognizes or generates the infinite set of six symbol strings that are defined by a Reber grammar. It is a symbol processing system, but with analog input and oscillatory subsymbolic representations. The time steps (machine cycles) of the system are implemented by rhythmic variation (clocking) of a bifurcation parameter. This holds input and "context" modules clamped at their attractors while'hidden and output modules change state, then clamps hidden and output states while context modules are released to load those states as the new context for the next cycle of input. Superior noise immunity has been demonstrated for systems with dynamic attractors over systems with static attractors, and synchronization ("binding") between coupled oscillatory attractors in different modules has been shown to be important for effecting reliable transitions.
Extended Regularization Methods for Nonconvergent Model Selection
Finnoff, W., Hergert, F., Zimmermann, H. G.
Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).
A Note on Learning Vector Quantization
Sa, Virginia R. de, Ballard, Dana H.
Vector Quantization is useful for data compression. Competitive Learning which minimizes reconstruction error is an appropriate algorithm for vector quantization of unlabelled data. Vector quantization of labelled data for classification has a different objective, to minimize the number of misclassifications, and a different algorithm is appropriate. We show that a variant of Kohonen's LVQ2.1 algorithm can be seen as a multiclass extension of an algorithm which in a restricted 2 class case can be proven to converge to the Bayes optimal classification boundary. We compare the performance of the LVQ2.1 algorithm to that of a modified version having a decreasing window and normalized step size, on a ten class vowel classification problem.
Assessing and Improving Neural Network Predictions by the Bootstrap Algorithm
The bootstrap method offers an computation intensive alternative to estimate the predictive distribution for a neural network even if the analytic derivation is intractable. The available asymptotic results show that it is valid for a large number of linear, nonlinear and even nonparametric regression problems. It has the potential to model the distribution of estimators to a higher precision than the usual normal asymptotics. It even may be valid if the normal asymptotics fail. However, the theoretical properties of bootstrap procedures for neural networks - especially nonlinear models - have to be investigated more comprehensively.
A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively
Chang, Eric I., Lippmann, Richard P.
A new boundary hunting radial basis function (BH-RBF) classifier which allocates RBF centers constructively near class boundaries is described. This classifier creates complex decision boundaries only in regions where confusions occur and corresponding RBF outputs are similar. A predicted square error measure is used to determine how many centers to add and to determine when to stop adding centers. Two experiments are presented which demonstrate the advantages of the BH RBF classifier. One uses artificial data with two classes and two input features where each class contains four clusters but only one cluster is near a decision region boundary.
Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison
A performance comparison of two self-organizing networks, the Kohonen Feature Map and the recently proposed Growing Cell Structures is made. For this purpose several performance criteria for self-organizing networks are proposed and motivated. The models are tested with three example problems of increasing difficulty. The Kohonen Feature Map demonstrates slightly superior results only for the simplest problem.
Learning Sequential Tasks by Incrementally Adding Higher Orders
An incremental, higher-order, non-recurrent network combines two properties found to be useful for learning sequential tasks: higherorder connections and incremental introduction of new units. The network adds higher orders when needed by adding new units that dynamically modify connection weights. Since the new units modify the weights at the next time-step with information from the previous step, temporal tasks can be learned without the use of feedback, thereby greatly simplifying training. Furthermore, a theoretically unlimited number of units can be added to reach into the arbitrarily distant past. Experiments with the Reber grammar have demonstrated speedups of two orders of magnitude over recurrent networks.