TD(0) Leads to Better Policies than Approximate Value Iteration
–Neural Information Processing Systems
We consider approximate value iteration with a parameterized approximator inwhich the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performanceloss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to having projection weights equal to the invariant distribution of the resulting policy. Suchprojection weighting leads to the same fixed points as TD(0). Our analysis also leads to the first performance loss bound for approximate valueiteration with an average cost objective.
Neural Information Processing Systems
Dec-31-2006
- Country:
- North America > United States
- Massachusetts > Middlesex County (0.14)
- California > Santa Clara County (0.14)
- North America > United States
- Genre:
- Research Report (0.46)
- Technology: