Error Propagation for Approximate Policy and Value Iteration

Farahmand, Amir-massoud, Szepesvári, Csaba, Munos, Rémi

Dec-31-2010–Neural Information Processing Systems

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Nikodym derivative of a certain distribution rather than its supremum -- as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Dec-31-2010

Conferences PDF

Add feedback

Country:
- North America
  - United States (0.46)
  - Canada > Alberta (0.28)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.94)
  - Machine Learning > Reinforcement Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found