Reviews: Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Jan-23-2025, 15:22:06 GMT–Neural Information Processing Systems

The paper studies the important problem of off-policy policy evaluation in long-horizon MDPs. The setting focuses on small-state, large-action problems. A novel estimator is proposed, whose finite-sample statistical properties are studied. Empirical results show the method is useful, especially in partially observable problems. Reviewers feel the experiment section can be strengthened (e.g., using more domains).

marginalized importance sampling, optimal off-policy evaluation, reinforcement learning

Neural Information Processing Systems

Jan-23-2025, 15:22:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)