AITopics | bounded off-policy evaluation

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 10:32:04 GMT

Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem's importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS's inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS. On the way, we categorize various properties and classify existing estimators by them. Besides the theoretical guarantees, empirical studies suggest the new estimators provide advantages.

bounded off-policy evaluation, name change, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Reviews: Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 23:08:26 GMT

This is a key study in the OPE literature, as methods to provide better stability for off-policy methods are required for practical applications of RL. _x000B_ - Table 1 is useful - provides a good summary and comparison of existing OPE estimators. Section 2.1 further provides a good summary of existing OPE estimators based on consistency, stability and boundedness. This is well written and easy to follow - and useful for the community as it provides a direct comparison between existing OPE estimators in terms of several properties.

bounded off-policy evaluation, estimator, ope estimator, (13 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Reviews: Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 23:08:16 GMT

This paper presents new estimators for Off Policy Evaluation (OPE) based on likelihoods and argues that the new estimators are better than Importance Sampling (IS). The paper provides strong theoretical guarantees of the estimators, and demonstrates their through simple experiments. The reviewers agree that the paper is well written overall and the proposed methods are technically sound and likely to be built upon by the community. One reviewer is unsure if the proposed methods will be practical in RL applications. The experiments are performed on very simple tasks.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 02:27:56 GMT

Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem's importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS's inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS.

bounded off-policy evaluation, reinforcement learning, snis, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Kallus, Nathan, Uehara, Masatoshi

Neural Information Processing SystemsMar-18-2020, 21:47:35 GMT

Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem's importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS's inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS. On the way, we categorize various properties and classify existing estimators by them.

bounded off-policy evaluation, reinforcement learning, snis, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Filters

Collaborating Authors

bounded off-policy evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Reviews: Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Reviews: Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning