Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare

Fang, Nan, Liu, Guiliang, Gong, Wei

arXiv.org Artificial Intelligence 

Reinforcement Learning (RL) applied in healthcare can lead to unsafe medical decisions and treatment, such as excessive dosages or abrupt changes, often due to agents overlooking common-sense constraints. Consequently, Constrained Reinforcement Learning (CRL) is a natural choice for safe decisions. However, specifying the exact cost function is inherently difficult in healthcare. Recent Inverse Constrained Reinforcement Learning (ICRL) is a promising approach that infers constraints from expert demonstrations. These settings do not align with the practical requirement of a decision-making system in healthcare, where decisions rely on historical treatment recorded in an offline dataset. To tackle these issues, we propose the Constraint Transformer (CT). Specifically, 1) we utilize a causal attention mechanism to incorporate historical decisions and observations into the constraint modeling, while employing a Non-Markovian layer for weighted constraints to capture critical states. In multiple medical scenarios, empirical results demonstrate that CT can capture unsafe states and achieve strategies that approximate lower mortality rates, reducing the occurrence probability of unsafe behaviors. In recent years, the doctor-to-patient ratio imbalance has drawn attention, with the U.S. having only 223.1 physicians per 100,000 people (Petterson et al., 2018). AI-assisted therapy emerges as a promising solution, offering timely diagnosis, personalized care, and reducing dependence on experienced physicians. Therefore, the development of an effective AI healthcare assistant is crucial. Table 1: Proportion of unsafe vaso Reinforcement learning (RL) offers a promising approach doses recommended by physician and to develop AI assistants by addressing sequential DDPG policy. However, this method can still range from 0.1 to 0.2µg/(kg min), lead to unsafe behaviors, such as administering excessive with doses above 0.5 considered high drug dosages, inappropriate adjustments of medical parameters, (Bassi et al., 2013).