Incremental Natural Actor-Critic Algorithms

Bhatnagar, Shalabh, Ghavamzadeh, Mohammad, Lee, Mark, Sutton, Richard S.

Dec-31-2008–Neural Information Processing Systems

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learningmethods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility withfunction approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variancein some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learningin the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.

algorithm, approximation, gradient, (14 more...)

Neural Information Processing Systems

Dec-31-2008

Conferences PDF

Add feedback

Country:
- North America > Canada
  - Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Asia > India
  - Karnataka > Bengaluru (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Statistical Learning > Gradient Descent (0.54)

Duplicate Docs Excel Report

Title
Incremental Natural Actor-Critic Algorithms
Incremental Natural Actor-Critic Algorithms

Similar Docs Excel Report more

Title	Similarity	Source
None found