Evaluating Reinforcement Learning Algorithms in Observational Health Settings
Gottesman, Omer, Johansson, Fredrik, Meier, Joshua, Dent, Jack, Lee, Donghun, Srinivasan, Srivatsan, Zhang, Linying, Ding, Yi, Wihl, David, Peng, Xuefeng, Yao, Jiayu, Lage, Isaac, Mosch, Christopher, Lehman, Li-wei H., Komorowski, Matthieu, Komorowski, Matthieu, Faisal, Aldo, Celi, Leo Anthony, Sontag, David, Doshi-Velez, Finale
Much attention has been devoted recently to the development of machine learning algorithms with the goal of improving treatment policies in healthcare. Reinforcement learning (RL) is a sub-field within machine learning that is concerned with learning how to make sequences of decisions so as to optimize long-term effects. Already, RL algorithms have been proposed to identify decision-making strategies for mechanical ventilation [Prasad et al., 2017], sepsis management [Raghu et al., 2017] and treatment of schizophrenia [Shortreed et al., 2011]. However, before implementing treatment policies learned by black-box algorithms in highstakes clinical decision problems, special care must be taken in the evaluation of these policies. Specifically, we focus on the observational setting, that is, the setting in which our RL algorithm has proposed some treatment policy, and we want to evaluate it based on historical data. This setting is common in healthcare applications, where we do not wish to experiment with patients' lives without evidence that the proposed treatment strategy may be better than current practice. While formal statistical methods have been developed to assess the quality of new policies based on observational data alone [Thomas and Brunskill, 2016, Precup et al., 2000, Pearl, 2009, Imbens and Rubin, 2015], these methods rely on strong assumptions and are limited by statistical properties. We do not attempt to summarize this vast literature in this work, rather, we aim to provide a conceptual starting point for clinical and computational researchers to ask the right questions when designing and evaluating algorithms for new ways of treating patients. In the following, we describe how choices about how to summarize a history, variance of statistical estimators, and confounders in more ad-hoc measures can result in unreliable, even misleading estimates of the quality of a treatment policy.
May-30-2018