Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms

Gullapalli, Vijaykumar, Barto, Andrew G.

Dec-31-1994–Neural Information Processing Systems

Reinforcement Learning methods based on approximating dynamic programming (DP) are receiving increased attention due to their utility in forming reactive control policies for systems embedded in dynamic environments. Environments are usually modeled as controlled Markov processes, but when the environment model is not known a priori, adaptive methods are necessary. Adaptive control methodsare often classified as being direct or indirect. Direct methods directly adapt the control policy from experience, whereas indirect methods adapt a model of the controlled process and compute controlpolicies based on the latest model. Our focus is on indirect adaptive DPbased methods in this paper. We present a convergence result for indirect adaptive asynchronous value iteration algorithmsfor the case in which a lookup table is used to store the value function. Our result implies convergence of several existing reinforcementlearning algorithms such as adaptive real-time dynamic programming (ARTDP) (Barto, Bradtke, & Singh, 1993) and prioritized sweeping (Moore & Atkeson, 1993). Although the emphasis of researchers studying DPbased reinforcement learning has been on direct adaptive methods such as Q-Learning (Watkins, 1989) and methods using TD algorithms (Sutton, 1988), it is not clear that these direct methods are preferable in practice to indirect methods such as those analyzed in this paper.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Dec-31-1994

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Hampshire County > Amherst (0.15)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms
Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms

Similar Docs Excel Report more

Title	Similarity	Source
None found