Goto

Collaborating Authors

 off-policy confidence interval estimation


Review for NeurIPS paper: CoinDICE: Off-Policy Confidence Interval Estimation

Neural Information Processing Systems

Weaknesses: Confidence intervals depend on the choice of function approximator (to have a parameter configuration to satisfy the desired criteria) and also on the optimization procedure (to find that exact parameter configuration). Unlike prior bounds, which were non-parametric, the proposed bound is parametric and there is no definite way provided regarding how to select these parameters. Unfortunately, three such functions approximators are needed in practice, one for distribution ratio \tau, one for the Lagnrangian \beta, and other for the constraint embedding \phi. This makes the confidence intervals dependent on both the choice of neural-network architecture (#layers, #nodes/layer, activation function, etc) and the choice of optimization routine (step size, optimizer, initial distribution, etc.) used to find the saddle points. Further, the optimal design choices might vary from domain to domain, making it harder for the end-user to use this bound.


CoinDICE: Off-Policy Confidence Interval Estimation

Neural Information Processing Systems

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the Q-function, we obtain an optimization problem with generalized estimating equation constraints. By applying the generalized empirical likelihood method to the resulting Lagrangian, we propose CoinDICE, a novel and efficient algorithm for computing confidence intervals. Theoretically, we prove the obtained confidence intervals are valid, in both asymptotic and finite-sample regimes. Empirically, we show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.