TD(0) Leads to Better Policies than Approximate Value Iteration

Dec-31-2006–Neural Information Processing Systems

We consider approximate value iteration with a parameterized approximator inwhich the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performanceloss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to having projection weights equal to the invariant distribution of the resulting policy. Suchprojection weighting leads to the same fixed points as TD(0). Our analysis also leads to the first performance loss bound for approximate valueiteration with an average cost objective.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Dec-31-2006

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County (0.14)
  - California > Santa Clara County (0.14)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (0.92)

Duplicate Docs Excel Report

Title
TD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration

Similar Docs Excel Report more

Title	Similarity	Source
None found