Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

Raghu, Aniruddh, Gottesman, Omer, Liu, Yao, Komorowski, Matthieu, Faisal, Aldo, Doshi-Velez, Finale, Brunskill, Emma

Jul-10-2018–arXiv.org Machine Learning

In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of empirical studies, we demonstrate how accurate OPE is strongly dependent on the calibration of estimated behaviour policy models: how precisely the behaviour policy is estimated from data. We show how powerful parametric models such as neural networks can result in highly uncalibrated behaviour policy models on a real-world medical dataset, and illustrate how a simple, non-parametric, k-nearest neighbours model produces better calibrated behaviour policy estimates and can be used to obtain superior importance sampling-based OPE estimates.

artificial intelligence, behaviour policy, health & medicine, (13 more...)

arXiv.org Machine Learning

Jul-10-2018

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Research Report (0.65)

Industry:
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.31)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found