Policy Evaluation Using the Ω-Return

Thomas, Philip S., Niekum, Scott, Theocharous, Georgios, Konidaris, George

Dec-31-2015–Neural Information Processing Systems

We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.

machine learning, reinforcement learning, trajectory, (17 more...)

Neural Information Processing Systems

Dec-31-2015

Conferences PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.28)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Duplicate Docs Excel Report

Title
Policy Evaluation Using the Ω-Return
Policy Evaluation Using the Ω-Return

Similar Docs Excel Report more

Title	Similarity	Source
None found