TD(0) Leads to Better Policies than Approximate Value Iteration

Roy, Benjamin V.

Neural Information Processing Systems 

We consider approximate value iteration with a parameterized approximator inwhich the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performanceloss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to having projection weights equal to the invariant distribution of the resulting policy. Suchprojection weighting leads to the same fixed points as TD(0). Our analysis also leads to the first performance loss bound for approximate valueiteration with an average cost objective.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found