An Instrumental Variable Approach to Confounded Off-Policy Evaluation
Xu, Yang, Zhu, Jin, Shi, Chengchun, Luo, Shikai, Song, Rui
–arXiv.org Artificial Intelligence
Offline policy evaluation (OPE) estimates the discounted cumulative reward following a given target policy with an offline dataset collected from another (possibly unknown) behavior policy. OPE is important in situations where it is impractical or too costly to directly evaluate the target policy via online experimentation, including robotics (Quillen et al., 2018), precision medicine (Murphy, 2003; Kosorok and Laber, 2019; Tsiatis et al., 2019), economics, quantitative social science (Abadie and Cattaneo, 2018), recommendation systems (Li et al., 2010; Kiyohara et al., 2022), etc. Despite a large body of literature on OPE (see Section 2 for detailed discussions), many of them rely on the assumption of no unmeasured confounders (NUC), excluding the existence of unobserved variables that could potentially confound either the action-reward or action-next-state pair. This assumption, however, can be violated in some real-world applications such as healthcare and technological industries. Our paper is partly motivated by the need to evaluate the long-term treatment effects of certain app download ads from a short-video platform.
arXiv.org Artificial Intelligence
Feb-2-2023