Reviews: Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
–Neural Information Processing Systems
The paper studies the important problem of off-policy policy evaluation in long-horizon MDPs. The setting focuses on small-state, large-action problems. A novel estimator is proposed, whose finite-sample statistical properties are studied. Empirical results show the method is useful, especially in partially observable problems. Reviewers feel the experiment section can be strengthened (e.g., using more domains).
Neural Information Processing Systems
Jan-23-2025, 15:22:06 GMT
- Technology: