Policy Evaluation Using the Ω-Return
Thomas, Philip S., Niekum, Scott, Theocharous, Georgios, Konidaris, George
–Neural Information Processing Systems
We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.
Neural Information Processing Systems
Dec-31-2015
- Country:
- North America > United States > Massachusetts (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Technology: