Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

Bennett, Andrew, Kallus, Nathan, Li, Lihong, Mousavi, Ali

Jul-27-2020–arXiv.org Artificial Intelligence

A fundamental question in offline reinforcement learning (RL) is how to estimate the value of some target evaluation policy, defined as the long-run average reward obtained by following the policy, using data logged by running a different behavior policy. This question, known as off-policy evaluation (OPE), often arises in applications such as healthcare, education, or robotics, where experimenting with running the target policy can be expensive or even impossible, but we have data logged following business as usual or current standards of care. A central concern using such passively observed data is that observed actions, rewards, and transitions may be confounded by unobserved variables, which can bias standard OPE methods that assume no unobserved confounders, or equivalently that a standard Markov decision process (MDP) model holds with fully observed state. Consider for example evaluating a new smart-phone app to help people living with type-1 diabetes time their insulin injections by monitoring their blood glucose level using some wearable device. Rather than risking giving bad advice that may harm individuals, we may consider first evaluating our injection-timing policy using existing longitudinal observations of individuals' blood glucose levels over time and the timing of insulin injections.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

Jul-27-2020

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (1.00)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Optimization (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found