Reinforcement Learning
Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming
This is a summary of results with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures, Dyna-AHC and Dyna-Q. Using a navigation task, results are shown for a simple Dyna-AHC system which simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. We show that Dyna-Q architectures (based on Watkins's Q-Iearning) are easy to adapt for use in changing environments.
Navigating through Temporal Difference
Barto, Sutton and Watkins [2] introduced a grid task as a didactic example of temporal difference planning and asynchronous dynamical pre gramming. This paper considers the effects of changing the coding of the input stimulus, and demonstrates that the self-supervised learning of a particular form of hidden unit representation improves performance.
Navigating through Temporal Difference
Barto, Sutton and Watkins [2] introduced a grid task as a didactic example of temporal difference planning and asynchronous dynamical pre gramming. This paper considers the effects of changing the coding of the input stimulus, and demonstrates that the self-supervised learning of a particular form of hidden unit representation improves performance.
A Reinforcement Learning Variant for Control Scheduling
However, a large class of continuous control problems require maintaining the system at a desired operating point, or setpoint, at a given time. We refer to this problem as the basic setpoint control problem [Guha 90], and have shown that reinforcement learning can be used, not surprisingly, quite well for such control tasks.
Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming
This is a summary of results with Dyna, a class of architectures for intelligent systemsbased on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures, Dyna-AHC and Dyna-Q. Using a navigation task, results are shown for a simple Dyna-AHC system which simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. We show that Dyna-Q architectures (based on Watkins's Q-Iearning) are easy to adapt for use in changing environments.
A Reinforcement Learning Variant for Control Scheduling
However, a large class of continuous control problems require maintaining the system at a desired operating point, or setpoint, at a given time. We refer to this problem as the basic setpoint control problem [Guha 90], and have shown that reinforcement learning can be used, not surprisingly, quite well for such control tasks. A more general version of the same problem requires steering the system from some 479 480 Guha initial or starting state to a desired state or setpoint at specific times without knowledge of the dynamics of the system. We therefore wish to examine how control scheduling tasks, where the system must be steered through a sequence of setpoints at specific times.
Navigating through Temporal Difference
Barto, Sutton and Watkins [2] introduced a grid task as a didactic example oftemporal difference planning and asynchronous dynamical pre gramming. Thispaper considers the effects of changing the coding of the input stimulus, and demonstrates that the self-supervised learning of a particular form of hidden unit representation improves performance.
Sequential Decision Problems and Neural Networks
Barto, A. G., Sutton, R. S., Watkins, C. J. C. H.
Decision making tasks that involve delayed consequences are very common yet difficult to address with supervised learning methods. If there is an accurate model of the underlying dynamical system, then these tasks can be formulated as sequential decision problems and solved by Dynamic Programming. This paper discusses reinforcement learning in terms of the sequential decision framework and shows how a learning algorithm similar to the one implemented by the Adaptive Critic Element used in the pole-balancer of Barto, Sutton, and Anderson (1983), and further developed by Sutton (1984), fits into this framework. Adaptive neural networks can play significant roles as modules for approximating the functions required for solving sequential decision problems.