Policy Evaluation Using the Ω-Return
–Neural Information Processing Systems
We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.
Neural Information Processing Systems
Mar-12-2024, 20:44:59 GMT
- Country:
- North America > United States
- Texas > Travis County
- Austin (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Massachusetts
- Middlesex County > Cambridge (0.04)
- Hampshire County > Amherst (0.04)
- Texas > Travis County
- North America > United States
- Genre:
- Research Report > New Finding (0.46)
- Technology: