AITopics

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.29)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Simple statistical gradient-following algorithms for connectionist reinforcement learning

Williams, R. J.

ClassicsFeb-1-1992

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

Classics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Practical issues in temporal difference learning

Tesauro, G.

ClassicsFeb-1-1992

This paper examines whether temporal difference methods for training connectionist networks, such as Sutton's TD(λ) algorithm, can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(λ) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex non-trivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set.

artificial intelligence, reinforcement learning, temporal difference, (4 more...)

Classics

Industry: Leisure & Entertainment > Games > Backgammon (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

The convergence of TD(lambda) for general lambda

Dayan, P.

ClassicsFeb-1-1992

Comments:!! See this http URL for any accompanying files

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Classics

Genre: Research Report (0.68)

Industry: Information Technology > Security & Privacy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming

Sutton, Richard S.

This is a summary of results with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures, Dyna-AHC and Dyna-Q. Using a navigation task, results are shown for a simple Dyna-AHC system which simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. We show that Dyna-Q architectures (based on Watkins's Q-Iearning) are easy to adapt for use in changing environments.

architecture, evaluation function, world model, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Waltham (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Iran (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Reinforcement Learning in Markovian and Non-Markovian Environments

Schmidhuber, Jürgen

This work addresses three problems with reinforcement learning and adaptive neuro-control: 1. Non-Markovian interfaces between learner and environment.

algorithm, controller, model network, (13 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

A Reinforcement Learning Variant for Control Scheduling

Guha, Aloke

However, a large class of continuous control problems require maintaining the system at a desired operating point, or setpoint, at a given time. We refer to this problem as the basic setpoint control problem [Guha 90], and have shown that reinforcement learning can be used, not surprisingly, quite well for such control tasks.

controller, reinforcement, setpoint, (14 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > District of Columbia > Washington (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming

Sutton, Richard S.

This is a summary of results with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures, Dyna-AHC and Dyna-Q. Using a navigation task, results are shown for a simple Dyna-AHC system which simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. We show that Dyna-Q architectures (based on Watkins's Q-Iearning) are easy to adapt for use in changing environments.

architecture, evaluation function, world model, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Waltham (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Iran (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Navigating through Temporal Difference

Dayan, Peter

Barto, Sutton and Watkins [2] introduced a grid task as a didactic example of temporal difference planning and asynchronous dynamical pre gramming. This paper considers the effects of changing the coding of the input stimulus, and demonstrates that the self-supervised learning of a particular form of hidden unit representation improves performance.

agent, prediction, representation, (16 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Navigating through Temporal Difference

Dayan, Peter

Barto, Sutton and Watkins [2] introduced a grid task as a didactic example of temporal difference planning and asynchronous dynamical pre gramming. This paper considers the effects of changing the coding of the input stimulus, and demonstrates that the self-supervised learning of a particular form of hidden unit representation improves performance.

agent, prediction, representation, (16 more...)