Reviews: Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Jan-23-2025, 15:22:17 GMT–Neural Information Processing Systems

Originality: The main idea of the paper - avoiding the long horizon problem by computing IS over state distributions rather than trajectories - was already introduced in (Liu et. However, the approach the authors take to leveraging this idea is original. Additionally, there is not yet enough published work on leveraging this potentially important idea (IS over state distribution), and therefore even being the second paper in this direction is still charting new territory. Quality - To the extent I looked at it the theoretical work is solid. I did not go over every equality in the proofs to check for algebraic errors, but I did go through every step in the proofs found in the appendix.

marginalized importance sampling, state distribution, trajectory, (11 more...)

Neural Information Processing Systems

Jan-23-2025, 15:22:17 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)