Deeply-Debiased Off-Policy Interval Estimation

Shi, Chengchun, Wan, Runzhe, Chernozhukov, Victor, Song, Rui

May-10-2021–arXiv.org Artificial Intelligence

Reinforcement learning (RL, Sutton & Barto, 2018) is a general technique in sequential decision making that learns an optimal policy to maximize the average cumulative reward. Prior to adopting any policy in practice, it is crucial to know the impact of implementing such a policy. In many real domains such as healthcare (Murphy et al., 2001; Luedtke & van der Laan, 2017; Shi et al., 2020a), robotics (Andrychowicz et al., 2020) and autonomous driving (Sallab et al., 2017), it is costly, risky, unethical, or even infeasible to evaluate a policy's impact by directly running this policy. This motivates us to study the off-policy evaluation (OPE) problem that learns a target policy's value with pre-collected data generated by a different behavior policy. In many applications (e.g., mobile health studies), the number of observations is limited. Take the OhioT1DM dataset (Marling & Bunescu, 2018) as an example, only a few thousands observations are available (Shi et al., 2020b). In these cases, in addition to a point estimate on a target policy's value, it is crucial to construct a confidence interval (CI) that quantifies the uncertainty of the value estimates. This paper is concerned with the following question: is it possible to develop a robust and efficient off-policy value estimator, and provide rigorous uncertainty quantification under practically feasible conditions? We will give an affirmative answer to this question.

diabetes, estimator, health & medicine, (18 more...)

arXiv.org Artificial Intelligence

May-10-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found