Asia
Reinforcement Learning Applied to Linear Quadratic Regulation
Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con vergence proofs for problems with continuous state and action spaces, or for systems involving nonlinear function approximators (such as multilayer perceptrons). This paper presents research applying DPbased reinforcement learning theory to Linear Quadratic Regulation (LQR), an important class of control problems involving continuous state and action spaces and requiring a simple type of nonlinear function approximator. We describe an algorithm based on Q-Iearning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly different algorithm that is only locally convergent to the optimal Q-function, demonstrating one of the possible pitfalls of using a nonlinear function approximator with DPbased learning.
Input Reconstruction Reliability Estimation
This paper describes a technique called Input Reconstruction Reliability Estimation (IRRE) for determining the response reliability of a restricted class of multi-layer perceptrons (MLPs). The technique uses a network's ability to accurately encode the input pattern in its internal representation as a measure of its reliability. The more accurately a network is able to reconstruct the input pattern from its internal representation, the more reliable the network is considered to be. IRRE is provides a good estimate of the reliability of MLPs trained for autonomous driving. Results are presented in which the reliability estimates provided by IRRE are used to select between networks trained for different driving situations. 1 Introduction In many real world domains it is important to know the reliability of a network's response since a single network cannot be expected to accurately handle all the possible inputs.
Feudal Reinforcement Learning
Dayan, Peter, Hinton, Geoffrey E.
One way to speed up reinforcement learning is to enable learning to happen simultaneously at multiple resolutions in space and time. This paper shows how to create a Q-Iearning managerial hierarchy in which high level managers learn how to set tasks to their submanagers who, in turn, learn how to satisfy them. Sub-managers need not initially understand their managers' commands. They simply learn to maximise their reinforcement in the context of the current command. We illustrate the system using a simple maze task.. As the system learns how to get around, satisfying commands at the multiple levels, it explores more efficiently than standard, flat, Q-Iearning and builds a more comprehensive map. 1 INTRODUCTION Straightforward reinforcement learning has been quite successful at some relatively complex tasks like playing backgammon (Tesauro, 1992).
Global Regularization of Inverse Kinematics for Redundant Manipulators
DeMers, David, Kreutz-Delgado, Kenneth
When m n, we say that the manipulator has redundant degrees--of -freedom (dot). The inverse kinematics problem is the following: given a desired workspace location x, find joint variables 0 such that f(O) x. Even when the forward kinematics is known, 255 256 DeMers and Kreutz-Delgado the inverse kinematics for a manipulator is not generically solvable in closed form (Craig. 1986).
Extended Regularization Methods for Nonconvergent Model Selection
Finnoff, W., Hergert, F., Zimmermann, H. G.
Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).
Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison
A performance comparison of two self-organizing networks, the Kohonen Feature Map and the recently proposed Growing Cell Structures is made. For this purpose several performance criteria for self-organizing networks are proposed and motivated. The models are tested with three example problems of increasing difficulty. The Kohonen Feature Map demonstrates slightly superior results only for the simplest problem.
Learning Sequential Tasks by Incrementally Adding Higher Orders
An incremental, higher-order, non-recurrent network combines two properties found to be useful for learning sequential tasks: higherorder connections and incremental introduction of new units. The network adds higher orders when needed by adding new units that dynamically modify connection weights. Since the new units modify the weights at the next time-step with information from the previous step, temporal tasks can be learned without the use of feedback, thereby greatly simplifying training. Furthermore, a theoretically unlimited number of units can be added to reach into the arbitrarily distant past. Experiments with the Reber grammar have demonstrated speedups of two orders of magnitude over recurrent networks.
Q-Learning with Hidden-Unit Restarting
Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restarting, units continue to learn via back-propagation. The resulting restart algorithm is tested in a Q-Iearning network that learns to solve an inverted pendulum problem. Solutions are found faster on average with the restart algorithm than without it.
How Oscillatory Neuronal Responses Reflect Bistability and Switching of the Hidden Assembly Dynamics
Pawelzik, K., Bauer, H.-U., Deppisch, J., Geisel, T.
A switching between apparently coherent (oscillatory) and stochastic episodes of activity has been observed in responses from cat and monkey visual cortex. We describe the dynamics of these phenomena in two parallel approaches,a phenomenological and a rather microscopic one. On the one hand we analyze neuronal responses in terms of a hidden state model (HSM). The parameters of this model are extracted directly from experimental spiketrains. They characterize the underlying dynamics as well as the coupling of individual neurons to the network. This phenomenological modelthus provides a new framework for the experimental analysis of network dynamics.