AITopics

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Moore, Andrew W., Atkeson, Christopher G.

Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

Neural Information Processing SystemsDec-31-1993

We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time performance. Classicalmethods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamicprogramming sweeps and to guide the exploration of statespace.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Neural Information Processing SystemsDec-31-1993

Q-Learning with Hidden-Unit Restarting

Anderson, Charles W.

Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restarting, unitscontinue to learn via back-propagation. The resulting restart algorithm is tested in a Q-Iearning network that learns to solve an inverted pendulum problem. Solutions are found faster on average with the restart algorithm than without it.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Prioritized sweeping—Reinforcement learning with less data and less time

Moore, A. W. | Atkeson, C. G.

ClassicsFeb-1-1993

We present a new algorithm,prioritized sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as temporal differencing and Q-learning have real-time performance. Classical methods are slower, but more accurate, because they make full use of the observations. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of state-space. We compare prioritized sweeping with other reinforcement learning schemes for a number of different stochastic optimal control problems.

artificial intelligence, machine learning, reinforcement learning

Classics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Williams, R. J. | Baird, L. C. I.

Tight performance bounds on greedy policies based on imperfect value functions

ClassicsFeb-1-1993

Reinforcement learning is an effective technique for learning action policies in discrete stochastic environments, but its efficiency can decay exponentially with the size of the state space. In many situations significant portions of a large state space may be irrelevant to a specific goal and can be aggregated into a few, relevant, states. The U Tree algorithm generates a tree based state discretization that efficiently finds the relevant state chunks of large propositional domains. In this paper, we extend the U Tree algorithm to challenging domains with a continuous state space for which there is no initial discretization.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Classics

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)

Fast Learning with Predictive Forward Models

Brody, Carlos

A method for transforming performance evaluation signals distal both in space and time into proximal signals usable by supervised learning algorithms, presented in [Jordan & Jacobs 90], is examined. A simple observation concerning differentiation through models trained with redundant inputs (as one of their networks is) explains a weakness in the original architecture and suggests a modification: an internal world model that encodes action-space exploration and, crucially, cancels input redundancy to the forward model is added. Learning time on an example task, cartpole balancing, is thereby reduced about 50 to 100 times. 1 INTRODUCTION In many learning control problems, the evaluation used to modify (and thus improve) control may not be available in terms of the controller's output: instead, it may be in terms of a spatial transformation of the controller's output variables (in which case we shall term it as being "distal in space"), or it may be available only several time steps into the future (termed as being "distal in time"). For example, control of a robot arm may be exerted in terms of joint angles, while evaluation may be in terms of the endpoint cartesian coordinates; furthermore, we may only wish to evaluate the endpoint coordinates reached after a certain period of time: the co- ·Current address: Computation and Neural Systems Program, California Institute of Technology, Pasadena CA.

controller, evaluation, intermediate model, (15 more...)

Country:

Asia > Middle East > Jordan (0.30)
North America > United States > California > Los Angeles County > Pasadena (0.24)
North America > United States > Massachusetts (0.04)
North America > Mexico > Mexico City (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)

Prescott, Tony J., Mayhew, John E. W.

Obstacle Avoidance through Reinforcement Learning

Following Simon's (1969) observation that complex behaviour may simply be the reflection of a complex environment a number of researchers (eg.

obstacle avoidance, reinforcement, sprite, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Practical Issues in Temporal Difference Learning

Tesauro, Gerald

TO('\) is an algorithm for adjusting the weights in a connectionist network which 259 260 Tesauro has the following form:

application, practical issue, tesauro, (14 more...)

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Industry: Leisure & Entertainment > Games > Backgammon (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

The Efficient Learning of Multiple Task Sequences

Singh, Satinder P.

I present a modular network architecture and a learning algorithm based on incremental dynamic programming that allows a single learning agent to learn to solve multiple Markovian decision tasks (MDTs) with significant transfer of learning across the tasks. I consider a class of MDTs, called composite tasks, formed by temporally concatenating a number of simpler, elemental MDTs. The architecture is trained on a set of composite and elemental MDTs. The temporal structure of a composite task is assumed to be unknown and the architecture learns to produce a temporal decomposition. It is shown that under certain conditions the solution of a composite MDT can be constructed by computationally inexpensive modifications of the solutions of its constituent elemental MDTs. 1 INTRODUCTION Most applications of domain independent learning algorithms have focussed on learning single tasks. Building more sophisticated learning agents that operate in complex environments will require handling multiple tasks/goals (Singh, 1992). Research effort on the scaling problem has concentrated on discovering faster learning algorithms, and while that will certainly help, techniques that allow transfer of learning across tasks will be indispensable for building autonomous learning agents that have to learn to solve multiple tasks. In this paper I consider a learning agent that interacts with an external, finite-state, discrete-time, stochastic dynamical environment and faces multiple sequences of Markovian decision tasks (MDTs).

architecture, composite task, elemental task, (17 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.29)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Fast Learning with Predictive Forward Models

Brody, Carlos

A method for transforming performance evaluation signals distal both in space and time into proximal signals usable by supervised learning algorithms, presented in [Jordan & Jacobs 90], is examined. A simple observation concerning differentiation through models trained with redundant inputs (as one of their networks is) explains a weakness in the original architecture and suggests a modification: an internal world model that encodes action-space exploration and, crucially, cancels input redundancy to the forward model is added. Learning time on an example task, cartpole balancing, is thereby reduced about 50 to 100 times. 1 INTRODUCTION In many learning control problems, the evaluation used to modify (and thus improve) control may not be available in terms of the controller's output: instead, it may be in terms of a spatial transformation of the controller's output variables (in which case we shall term it as being "distal in space"), or it may be available only several time steps into the future (termed as being "distal in time"). For example, control of a robot arm may be exerted in terms of joint angles, while evaluation may be in terms of the endpoint cartesian coordinates; furthermore, we may only wish to evaluate the endpoint coordinates reached after a certain period of time: the co- ·Current address: Computation and Neural Systems Program, California Institute of Technology, Pasadena CA.

controller, evaluation, intermediate model, (15 more...)