Policy Evaluation Using the Ω-Return

Mar-12-2024, 20:44:59 GMT–Neural Information Processing Systems

We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.

diagonal, trajectory, variance, (14 more...)

Neural Information Processing Systems

Mar-12-2024, 20:44:59 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Texas > Travis County
    - Austin (0.04)
  - Pennsylvania > Philadelphia County
    - Philadelphia (0.04)
  - Massachusetts
    - Middlesex County > Cambridge (0.04)
    - Hampshire County > Amherst (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)