AITopics | off-policy confidence interval estimation

Collaborating Authors

off-policy confidence interval estimation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CoinDICE: Off-Policy Confidence Interval Estimation

Neural Information Processing SystemsDec-24-2025, 03:41:53 GMT

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the Q-function, we obtain an optimization problem with generalized estimating equation constraints. By applying the generalized empirical likelihood method to the resulting Lagrangian, we propose CoinDICE, a novel and efficient algorithm for computing confidence intervals. Theoretically, we prove the obtained confidence intervals are valid, in both asymptotic and finite-sample regimes. Empirically, we show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.

coindice, name change, off-policy confidence interval estimation, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.62)
Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Review for NeurIPS paper: CoinDICE: Off-Policy Confidence Interval Estimation

Neural Information Processing SystemsJan-25-2025, 09:02:39 GMT

Weaknesses: Confidence intervals depend on the choice of function approximator (to have a parameter configuration to satisfy the desired criteria) and also on the optimization procedure (to find that exact parameter configuration). Unlike prior bounds, which were non-parametric, the proposed bound is parametric and there is no definite way provided regarding how to select these parameters. Unfortunately, three such functions approximators are needed in practice, one for distribution ratio \tau, one for the Lagnrangian \beta, and other for the constraint embedding \phi. This makes the confidence intervals dependent on both the choice of neural-network architecture (#layers, #nodes/layer, activation function, etc) and the choice of optimization routine (step size, optimizer, initial distribution, etc.) used to find the saddle points. Further, the optimal design choices might vary from domain to domain, making it harder for the end-user to use this bound.

function approximator, neurips paper, off-policy confidence interval estimation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

CoinDICE: Off-Policy Confidence Interval Estimation

Neural Information Processing SystemsOct-10-2024, 10:55:58 GMT

coindice, off-policy confidence interval estimation

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback