Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou
–Neural Information Processing Systems
In this paper, we propose a new off-policy estimator that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance faced by existing methods.
Neural Information Processing Systems
Nov-20-2025, 20:37:37 GMT
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- California > Santa Clara County
- Palo Alto (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Texas > Travis County
- Austin (0.14)
- California > Santa Clara County
- Canada > Quebec
- Europe > United Kingdom
- Industry:
- Transportation (0.47)
- Technology: