TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning

Konidaris, George, Niekum, Scott, Thomas, Philip S.

Dec-31-2011–Neural Information Processing Systems

We show that the lambda-return target used in the TD(lambda) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n. We introduce the gamma-return estimator, an alternative target based on a more accurate model of variance, which defines the TD_gamma family of complex-backup temporal difference learning algorithms. We derive TD_gamma, the gamma-return equivalent of the original TD(lambda) algorithm, which eliminates the lambda parameter but can only perform updates at the end of an episode and requires time and space proportional to the episode length. We then derive a second algorithm, TD_gamma(C), with a capacity parameter C. TD_gamma(C) requires C times more time and memory than TD(lambda) and is incremental and online. We show that TD_gamma outperforms TD(lambda) for any setting of lambda on 4 out of 5 benchmark domains, and that TD_gamma(C) performs as well as or better than TD_gamma for intermediate settings of C.

algorithm, artificial intelligence, bayesian inference, (19 more...)

Neural Information Processing Systems

Dec-31-2011

Conferences PDF

Add feedback

Country:
- North America > United States > Massachusetts
  - Hampshire County > Amherst (0.14)
  - Middlesex County > Cambridge (0.14)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.35)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.35)

Duplicate Docs Excel Report

Title
Re-evaluating Complex Backups in Temporal Difference Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found