Model-Free Least-Squares Policy Iteration

Lagoudakis, Michail G., Parr, Ronald

Neural Information Processing Systems 

We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its efficient use of sample experiences compared to pure temporal difference algorithms. LSTD is ideal for prediction problems, however it heretofore has not had a straightforward application to control problems. Moreover, approximations learned by LSTD are strongly influenced by the visitation distribution over states.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found