Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation

Dec-24-2025, 06:13:23 GMT–Neural Information Processing Systems

In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative evaluation using observational data can help practitioners understand the generalization performance of new policies. However, this type of off-policy evaluation (OPE) is inherently limited since offline data may not reflect the distribution shifts resulting from the application of new policies. On the other hand, online evaluation by collecting rollouts according to the new policy is often infeasible, as deploying new policies in these domains can be unsafe. In this work, we propose a semi-offline evaluation framework as an intermediate step between offline and online evaluation, where human users provide annotations of unobserved counterfactual trajectories. While tempting to simply augment existing data with such annotations, we show that this naive approach can lead to biased results.

annotation, counterfactual-augmented importance sampling, semi-offline policy evaluation, (8 more...)

Neural Information Processing Systems

Dec-24-2025, 06:13:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)