Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

Greensmith, Evan, Bartlett, Peter L., Baxter, Jonathan

Dec-31-2002–Neural Information Processing Systems

We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline method, in which a function of the current state is added to the discounted value estimate. We relate the performance of these methods, which use sample paths, to the variance of estimates based on iid data. We derive the baseline function that minimizes this variance, and we show that the variance for any baseline is the sum of the optimal variance and a weighted squared distance to the optimal baseline. We show that the widely used average discounted value baseline (where the reward is replaced by the difference between the reward and its expectation) is suboptimal.

baseline, value function, variance, (14 more...)

Neural Information Processing Systems

Dec-31-2002

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.72)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.74)

Duplicate Docs Excel Report

Title
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found