AITopics

In this paper, we discuss online estimation strategies that model the optimal value function of a typical optimal control problem. We present a general strategy that uses local corridor solutions obtained via dynamic programming to provide local optimal control sequencetraining data for a neural architecture model of the optimal value function.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Mitchell, Tom M., Thrun, Sebastian B.

Explanation-Based Neural Network Learning for Robot Control

How can artificial neural nets generalize better from fewer examples? In order to generalize successfully, neural network learning methods typically require large training data sets. We introduce a neural network learning method that generalizes rationally from many fewer data points, relying instead on prior knowledge encoded in previously learned neural networks. For example, in robot control learning tasks reported here, previously learned networks that model the effects of robot actions are used to guide subsequent learning of robot control functions. For each observed training example of the target function (e.g. the robot control policy), the learner explains the observed example in terms of its prior knowledge, then analyzes this explanation to infer additional information about the shape, or slope, of the target function. This shape knowledge is used to bias generalization when learning the target function. Results are presented applying this approach to a simulated robot task based on reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > San Mateo County (0.14)
Europe > United Kingdom > England (0.14)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Using Aperiodic Reinforcement for Directed Self-Organization During Development

Montague, P. R., Dayan, P., Nowlan, S.J., Pouget, A, Sejnowski, T.J.

We present a local learning rule in which Hebbian learning is conditional on an incorrect prediction of a reinforcement signal. We propose a biological interpretation of such a framework and display its utility through examples in which the reinforcement signal is cast as the delivery of a neuromodulator to its target. Three exam pIes are presented which illustrate how this framework can be applied to the development of the oculomotor system. 1 INTRODUCTION Activity-dependent accounts of the self-organization of the vertebrate brain have relied ubiquitously on correlational (mainly Hebbian) rules to drive synaptic learning. Inthe brain, a major problem for any such unsupervised rule is that many different kinds of correlations exist at approximately the same time scales and each is effectively noise to the next. For example, relationships within and between the retinae among variables such as color, motion, and topography may mask one another and disrupt their appropriate segregation at the level of the thalamus or cortex.

machine learning, reinforcement, reinforcement learning, (17 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Industry: Health & Medicine (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Leen, Todd K., Moody, John E.

Weight Space Probability Densities in Stochastic Learning: I. Dynamics and Equilibria

The ensemble dynamics of stochastic learning algorithms can be studied using theoretical techniques from statistical physics. We develop the equations of motion for the weight space probability densities for stochastic learning algorithms. We discuss equilibria in the diffusion approximation and provide expressions for special cases of the LMS algorithm. The equilibrium densities are not in general thermal (Gibbs) distributions in the objective function being minimized,but rather depend upon an effective potential that includes diffusion effects. Finally we present an exact analytical expression for the time evolution of the density for a learning algorithm withweight updates proportional to the sign of the gradient.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country: North America > United States > Oregon (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Learning Control Under Extreme Uncertainty

Gullapalli, Vijaykumar

A peg-in-hole insertion task is used as an example to illustrate the utility of direct associative reinforcement learning methods for learning control under real-world conditions of uncertainty and noise. Task complexity due to the use of an unchamfered hole and a clearance of less than 0.2mm is compounded by the presence of positional uncertainty of magnitude exceeding 10 to 50 times the clearance. Despite this extreme degree of uncertainty, our results indicate that direct reinforcement learning can be used to learn a robust reactive control strategy that results in skillful peg-in-hole insertions.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.15)

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Reinforcement Learning Applied to Linear Quadratic Regulation

Bradtke, Steven J.

Recent research on reinforcement learning has focused on algorithms basedon the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms isthe control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con vergence proofsfor problems with continuous state and action spaces, or for systems involving nonlinear function approximators (such as multilayer perceptrons). This paper presents research applying DPbased reinforcement learning theory to Linear Quadratic Regulation (LQR),an important class of control problems involving continuous state and action spaces and requiring a simple type of nonlinear function approximator. We describe an algorithm based on Q-Iearning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly different algorithm that is only locally convergent to the optimal Q-function, demonstrating one of the possible pitfalls of using a nonlinear function approximator with DPbased learning.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Moore, Andrew W., Atkeson, Christopher G.

Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classicalmethods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamicprogramming sweeps and to guide the exploration of statespace.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Q-Learning with Hidden-Unit Restarting

Anderson, Charles W.

Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restarting, unitscontinue to learn via back-propagation. The resulting restart algorithm is tested in a Q-Iearning network that learns to solve an inverted pendulum problem. Solutions are found faster on average with the restart algorithm than without it.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Prioritized sweeping—Reinforcement learning with less data and less time

Moore, A. W. | Atkeson, C. G.

ClassicsFeb-1-1993

We present a new algorithm,prioritized sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as temporal differencing and Q-learning have real-time performance. Classical methods are slower, but more accurate, because they make full use of the observations. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of state-space. We compare prioritized sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems.

artificial intelligence, machine learning, reinforcement learning

Classics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)