Review for NeurIPS paper: Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Feb-11-2025, 21:19:22 GMT–Neural Information Processing Systems

The paper provides a very general minimax framework for quantifying the bias/approximation error in off-policy evaluation, and the results apply to a range of OPE methods. Reviewers generally agree that this is a good paper and there is contribution. One potentially improvable direction would be to quantify the statistical noise in off-policy evaluation, which is nontrivial but extremely important. Reviewers, AC and SAC also agree that such analysis could be left for future work. We would also like to strongly suggest that the authors consider rephrase/explain the wording "confidence interval".

approximation error, minimax value interval, off-policy evaluation and policy optimization, (1 more...)

Neural Information Processing Systems

Feb-11-2025, 21:19:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)