An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Xu, Yang, Zhu, Jin, Shi, Chengchun, Luo, Shikai, Song, Rui

Feb-2-2023–arXiv.org Artificial Intelligence

Offline policy evaluation (OPE) estimates the discounted cumulative reward following a given target policy with an offline dataset collected from another (possibly unknown) behavior policy. OPE is important in situations where it is impractical or too costly to directly evaluate the target policy via online experimentation, including robotics (Quillen et al., 2018), precision medicine (Murphy, 2003; Kosorok and Laber, 2019; Tsiatis et al., 2019), economics, quantitative social science (Abadie and Cattaneo, 2018), recommendation systems (Li et al., 2010; Kiyohara et al., 2022), etc. Despite a large body of literature on OPE (see Section 2 for detailed discussions), many of them rely on the assumption of no unmeasured confounders (NUC), excluding the existence of unobserved variables that could potentially confound either the action-reward or action-next-state pair. This assumption, however, can be violated in some real-world applications such as healthcare and technological industries. Our paper is partly motivated by the need to evaluate the long-term treatment effects of certain app download ads from a short-video platform.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Feb-2-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Industry:
- Marketing (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.47)
    - Reinforcement Learning (0.93)
    - Statistical Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found