Reviews: Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Jan-23-2025, 23:08:16 GMT–Neural Information Processing Systems

This paper presents new estimators for Off Policy Evaluation (OPE) based on likelihoods and argues that the new estimators are better than Importance Sampling (IS). The paper provides strong theoretical guarantees of the estimators, and demonstrates their through simple experiments. The reviewers agree that the paper is well written overall and the proposed methods are technically sound and likely to be built upon by the community. One reviewer is unsure if the proposed methods will be practical in RL applications. The experiments are performed on very simple tasks.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Jan-23-2025, 23:08:16 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)