Off-Policy Risk Assessment in Contextual Bandits
–Neural Information Processing Systems
Even when unable to run experiments, practitioners can evaluate prospective policies, using previously logged data. However, while the bandits literature has adopted a diverse set of objectives, most research on off-policy evaluation to date focuses on the expected reward. In this paper, we introduce Lipschitz risk functionals, a broad class of objectives that subsumes conditional value-at-risk (CVaR), variance, mean-variance, many distorted risks, and CPT risks, among others. We propose Off-Policy Risk Assessment (OPRA), a framework that first estimates a target policy's CDF and then generates plugin estimates for any collection of Lipschitz risks, providing finite sample guarantees that hold simultaneously over the entire class.
Neural Information Processing Systems
Mar-21-2025, 18:30:24 GMT
- Country:
- North America > United States > Illinois (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Technology: