Review for NeurIPS paper: CoinDICE: Off-Policy Confidence Interval Estimation

Jan-25-2025, 09:02:39 GMT–Neural Information Processing Systems

Weaknesses: Confidence intervals depend on the choice of function approximator (to have a parameter configuration to satisfy the desired criteria) and also on the optimization procedure (to find that exact parameter configuration). Unlike prior bounds, which were non-parametric, the proposed bound is parametric and there is no definite way provided regarding how to select these parameters. Unfortunately, three such functions approximators are needed in practice, one for distribution ratio \tau, one for the Lagnrangian \beta, and other for the constraint embedding \phi. This makes the confidence intervals dependent on both the choice of neural-network architecture (#layers, #nodes/layer, activation function, etc) and the choice of optimization routine (step size, optimizer, initial distribution, etc.) used to find the saddle points. Further, the optimal design choices might vary from domain to domain, making it harder for the end-user to use this bound.

function approximator, neurips paper, off-policy confidence interval estimation, (2 more...)

Neural Information Processing Systems

Jan-25-2025, 09:02:39 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)