Reviews: Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
–Neural Information Processing Systems
This is a key study in the OPE literature, as methods to provide better stability for off-policy methods are required for practical applications of RL. _x000B_ - Table 1 is useful - provides a good summary and comparison of existing OPE estimators. Section 2.1 further provides a good summary of existing OPE estimators based on consistency, stability and boundedness. This is well written and easy to follow - and useful for the community as it provides a direct comparison between existing OPE estimators in terms of several properties.
Neural Information Processing Systems
Jan-23-2025, 23:08:26 GMT