Model-Free Least-Squares Policy Iteration

Dec-31-2002–Neural Information Processing Systems

We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its efficient use of sample experiences compared to pure temporal difference algorithms. LSTD is ideal for prediction problems, however it heretofore has not had a straightforward application to control problems. Moreover, approximations learned by LSTD are strongly influenced by the visitation distribution over states.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Dec-31-2002

Conferences PDF

Add feedback

Country:
- North America > United States > California > San Francisco County > San Francisco (0.15)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Model-Free Least-Squares Policy Iteration
Model-Free Least-Squares Policy Iteration

Similar Docs Excel Report more

Title	Similarity	Source
None found