Reviews: Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Jan-23-2025, 23:08:26 GMT–Neural Information Processing Systems

This is a key study in the OPE literature, as methods to provide better stability for off-policy methods are required for practical applications of RL. _x000B_ - Table 1 is useful - provides a good summary and comparison of existing OPE estimators. Section 2.1 further provides a good summary of existing OPE estimators based on consistency, stability and boundedness. This is well written and easy to follow - and useful for the community as it provides a direct comparison between existing OPE estimators in terms of several properties.

bounded off-policy evaluation, estimator, ope estimator, (13 more...)

Neural Information Processing Systems

Jan-23-2025, 23:08:26 GMT

Conferences Web Page

Add feedback

Genre:
- Summary/Review (0.36)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)