AITopics

Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con vergence proofs for problems with continuous state and action spaces, or for systems involving nonlinear function approximators (such as multilayer perceptrons). This paper presents research applying DPbased reinforcement learning theory to Linear Quadratic Regulation (LQR), an important class of control problems involving continuous state and action spaces and requiring a simple type of nonlinear function approximator. We describe an algorithm based on Q-Iearning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly different algorithm that is only locally convergent to the optimal Q-function, demonstrating one of the possible pitfalls of using a nonlinear function approximator with DPbased learning.

algorithm, policy iteration algorithm, q-function, (10 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.05)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Input Reconstruction Reliability Estimation

Pomerleau, Dean A.

This paper describes a technique called Input Reconstruction Reliability Estimation (IRRE) for determining the response reliability of a restricted class of multi-layer perceptrons (MLPs). The technique uses a network's ability to accurately encode the input pattern in its internal representation as a measure of its reliability. The more accurately a network is able to reconstruct the input pattern from its internal representation, the more reliable the network is considered to be. IRRE is provides a good estimate of the reliability of MLPs trained for autonomous driving. Results are presented in which the reliability estimates provided by IRRE are used to select between networks trained for different driving situations. 1 Introduction In many real world domains it is important to know the reliability of a network's response since a single network cannot be expected to accurately handle all the possible inputs.

irre, reliability, representation, (12 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey > Essex County > Newark (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Industry:

Transportation > Ground > Road (0.36)
Automobiles & Trucks (0.36)
Information Technology (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Dayan, Peter, Hinton, Geoffrey E.

Feudal Reinforcement Learning

One way to speed up reinforcement learning is to enable learning to happen simultaneously at multiple resolutions in space and time. This paper shows how to create a Q-Iearning managerial hierarchy in which high level managers learn how to set tasks to their submanagers who, in turn, learn how to satisfy them. Sub-managers need not initially understand their managers' commands. They simply learn to maximise their reinforcement in the context of the current command. We illustrate the system using a simple maze task.. As the system learns how to get around, satisfying commands at the multiple levels, it explores more efficiently than standard, flat, Q-Iearning and builds a more comprehensive map. 1 INTRODUCTION Straightforward reinforcement learning has been quite successful at some relatively complex tasks like playing backgammon (Tesauro, 1992).

agent, level manager, reinforcement, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

DeMers, David, Kreutz-Delgado, Kenneth

Global Regularization of Inverse Kinematics for Redundant Manipulators

When m n, we say that the manipulator has redundant degrees--of -freedom (dot). The inverse kinematics problem is the following: given a desired workspace location x, find joint variables 0 such that f(O) x. Even when the forward kinematics is known, 255 256 DeMers and Kreutz-Delgado the inverse kinematics for a manipulator is not generically solvable in closed form (Craig. 1986).

input space, manifold, manipulator, (14 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > California > San Diego County > La Jolla (0.05)
Asia > Middle East > Jordan (0.05)

Technology:

Information Technology > Artificial Intelligence > Robots (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Finnoff, W., Hergert, F., Zimmermann, H. G.

Extended Regularization Methods for Nonconvergent Model Selection

Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).

penalty term, test variable, training process, (14 more...)

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison

Fritzke, Bernd

A performance comparison of two self-organizing networks, the Kohonen Feature Map and the recently proposed Growing Cell Structures is made. For this purpose several performance criteria for self-organizing networks are proposed and motivated. The models are tested with three example problems of increasing difficulty. The Kohonen Feature Map demonstrates slightly superior results only for the simplest problem.

cell structure, neuron, reference vector, (15 more...)

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Learning Sequential Tasks by Incrementally Adding Higher Orders

Ring, Mark

An incremental, higher-order, non-recurrent network combines two properties found to be useful for learning sequential tasks: higherorder connections and incremental introduction of new units. The network adds higher orders when needed by adding new units that dynamically modify connection weights. Since the new units modify the weights at the next time-step with information from the previous step, temporal tasks can be learned without the use of feedback, thereby greatly simplifying training. Furthermore, a theoretically unlimited number of units can be added to reach into the arbitrarily distant past. Experiments with the Reber grammar have demonstrated speedups of two orders of magnitude over recurrent networks.

learning sequential task, new unit, recurrent network, (11 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Q-Learning with Hidden-Unit Restarting

Anderson, Charles W.

Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restarting, units continue to learn via back-propagation. The resulting restart algorithm is tested in a Q-Iearning network that learns to solve an inverted pendulum problem. Solutions are found faster on average with the restart algorithm than without it.

algorithm, pendulum, restart, (14 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
Asia > Middle East > Jordan (0.05)
(5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

On the Use of Projection Pursuit Constraints for Training Neural Networks

Intrator, Nathan

Some improved generalization properties are demonstrat.ed

ion, projection index, projection pursuit, (13 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Cupertino (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

How Oscillatory Neuronal Responses Reflect Bistability and Switching of the Hidden Assembly Dynamics

Pawelzik, K., Bauer, H.-U., Deppisch, J., Geisel, T.

A switching between apparently coherent (oscillatory) and stochastic episodes of activity has been observed in responses from cat and monkey visual cortex. We describe the dynamics of these phenomena in two parallel approaches,a phenomenological and a rather microscopic one. On the one hand we analyze neuronal responses in terms of a hidden state model (HSM). The parameters of this model are extracted directly from experimental spiketrains. They characterize the underlying dynamics as well as the coupling of individual neurons to the network. This phenomenological modelthus provides a new framework for the experimental analysis of network dynamics.

assembly, neuron, oscillatory neuronal response reflect bistability, (12 more...)

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Weinheim (0.05)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > United Kingdom > England > East Sussex > Brighton (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)