Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Dec-23-2025, 20:02:40 GMT–Neural Information Processing Systems

We study minimax methods for off-policy evaluation (OPE) using value functions and marginalized importance weights. Despite that they hold promises of overcoming the exponential variance in traditional importance sampling, several key problems remain: (1) They require function approximation and are generally biased. For the sake of trustworthy OPE, is there anyway to quantify the biases?

minimax value interval, name change, off-policy evaluation and policy optimization, (4 more...)

Neural Information Processing Systems

Dec-23-2025, 20:02:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.64)