Goto

Collaborating Authors

 vasopressor


Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

Sun, Yingchuan, Tang, Shengpu

arXiv.org Artificial Intelligence

Existing studies on reinforcement learning (RL) for sepsis management have mostly followed an established problem setup, in which patient data are aggregated into 4-hour time steps. Although concerns have been raised regarding the coarseness of this time-step size, which might distort patient dynamics and lead to suboptimal treatment policies, the extent to which this is a problem in practice remains unexplored. In this work, we conducted empirical experiments for a controlled comparison of four time-step sizes ($Δt\!=\!1,2,4,8$ h) on this domain, following an identical offline RL pipeline. To enable a fair comparison across time-step sizes, we designed action re-mapping methods that allow for evaluation of policies on datasets with different time-step sizes, and conducted cross-$Δt$ model selections under two policy learning setups. Our goal was to quantify how time-step size influences state representation learning, behavior cloning, policy training, and off-policy evaluation. Our results show that performance trends across $Δt$ vary as learning setups change, while policies learned at finer time-step sizes ($Δt = 1$ h and $2$ h) using a static behavior policy achieve the overall best performance and stability. Our work highlights time-step size as a core design choice in offline RL for healthcare and provides evidence supporting alternatives beyond the conventional 4-hour setup.


MIMIC-Sepsis: A Curated Benchmark for Modeling and Learning from Sepsis Trajectories in the ICU

Huang, Yong, Yang, Zhongqi, Rahmani, Amir

arXiv.org Artificial Intelligence

Abstract--Sepsis is a leading cause of mortality in intensive care units (ICUs), yet existing research often relies on outdated datasets, non-reproducible preprocessing pipelines, and limited coverage of clinical interventions. We introduce MIMIC-Sepsis, a curated cohort and benchmark framework derived from the MIMIC-IV database, designed to support reproducible modeling of sepsis trajectories. Our cohort includes 35,239 ICU patients with time-aligned clinical variables and standardized treatment data, including vasopressors, fluids, mechanical ventilation and antibiotics. We describe a transparent preprocess-ing pipeline--based on Sepsis-3 criteria, structured imputation strategies, and treatment inclusion--and release it alongside benchmark tasks focused on early mortality prediction, length-of-stay estimation, and shock onset classification. Empirical results demonstrate that incorporating treatment variables substantially improves model performance, particularly for Transformer-based architectures. MIMIC-Sepsis serves as a robust platform for evaluating predictive and sequential models in critical care research. Sepsis is a life-threatening condition caused by the body's extreme response to an infection that can lead to organ failure and even death.


Individualized Multi-Treatment Response Curves Estimation using RBF-net with Shared Neurons

Chang, Peter, Roy, Arkaprava

arXiv.org Machine Learning

Estimation of heterogeneous treatment effects from observational data has become an important problem. It plays a crucial role in determining the individualized causal effects of a treatment, which then leads to a personalized assignment of optimal treatment (Wendling et al., 2018; Rekkas et al., 2020). Estimation of such heterogeneity however requires reasonable representations from each treatment subgroup. With the increasing availability of large-scale health outcome data such as electronic health records (EHR) data in recent years, it has become possible to develop individualized treatment strategies efficiently. This led to the development of several novel statistical methods, primarily tailored for binary treatment scenarios (Wendling et al., 2018; Cheng et al., 2020), with some accommodating multiple treatment settings (Brown et al., 2020; Chalkou et al., 2021). Most of these approaches are specifically designed for estimating population average treatment effects (ATEs) (Van Der Laan and Rubin, 2006; Chernozhukov et al., 2018; McCaffrey et al., 2013) and more recently, methods are being developed to estimate conditional average treatment effects (CATEs) (Taddy et al., 2016; Wager and Athey, 2018; Künzel et al., 2019; Nie and Wager, 2021). Here, we tackle a generic problem of heterogeneous treatment effect or CATE estimation in a multi-treatment setting, where the treatment responses may share some commonalities.


APRICOT: Acuity Prediction in Intensive Care Unit (ICU): Predicting Stability, Transitions, and Life-Sustaining Therapies

Contreras, Miguel, Silva, Brandon, Shickel, Benjamin, Baslanti, Tezcan Ozrazgat, Ren, Yuanfang, Guan, Ziyuan, Bandyopadhyay, Sabyasachi, Khezeli, Kia, Bihorac, Azra, Rashidi, Parisa

arXiv.org Artificial Intelligence

The acuity state of patients in the intensive care unit (ICU) can quickly change from stable to unstable, sometimes leading to life-threatening conditions. Early detection of deteriorating conditions can result in providing more timely interventions and improved survival rates. Current approaches rely on manual daily assessments. Some data-driven approaches have been developed, that use mortality as a proxy of acuity in the ICU. However, these methods do not integrate acuity states to determine the stability of a patient or the need for life-sustaining therapies. In this study, we propose APRICOT (Acuity Prediction in Intensive Care Unit), a Transformer-based neural network to predict acuity state in real-time in ICU patients. We develop and extensively validate externally, temporally, and prospectively the APRICOT model on three large datasets: University of Florida Health (UFH), eICU Collaborative Research Database (eICU), and Medical Information Mart for Intensive Care (MIMIC)-IV. The performance of APRICOT shows comparable results to state-of-the-art mortality prediction models (external AUROC 0.93-0.93, temporal AUROC 0.96-0.98, and prospective AUROC 0.98) as well as acuity prediction models (external AUROC 0.80-0.81, temporal AUROC 0.77-0.78, and prospective AUROC 0.87). Furthermore, APRICOT can make predictions for the need for life-sustaining therapies, showing comparable results to state-of-the-art ventilation prediction models (external AUROC 0.80-0.81, temporal AUROC 0.87-0.88, and prospective AUROC 0.85), and vasopressor prediction models (external AUROC 0.82-0.83, temporal AUROC 0.73-0.75, prospective AUROC 0.87). This tool allows for real-time acuity monitoring of a patient and can provide helpful information to clinicians to make timely interventions. Furthermore, the model can suggest life-sustaining therapies that the patient might need in the next hours in the ICU.


Explaining a machine learning decision to physicians via counterfactuals

Nagesh, Supriya, Mishra, Nina, Naamad, Yonatan, Rehg, James M., Shah, Mehul A., Wagner, Alexei

arXiv.org Artificial Intelligence

Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. \textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulted in the opposite outcome. Specifically, time-series CFs are investigated, inspired by the way physicians converse and reason out decisions `I would have given the patient a vasopressor if their blood pressure was lower and falling'. Key properties of CFs that are particularly meaningful in clinical settings are outlined: physiological plausibility, relevance to the task and sparse perturbations. Past work on CF generation does not satisfy these properties, specifically plausibility in that realistic time-series CFs are not generated. A variational autoencoder (VAE)-based approach is proposed that captures these desired properties. The method produces CFs that improve on prior approaches quantitatively (more plausible CFs as evaluated by their likelihood w.r.t original data distribution, and 100$\times$ faster at generating CFs) and qualitatively (2$\times$ more plausible and relevant) as evaluated by three physicians.


Self-Supervised Predictive Coding with Multimodal Fusion for Patient Deterioration Prediction in Fine-grained Time Resolution

Lee, Kwanhyung, Won, John, Hyun, Heejung, Hahn, Sangchul, Choi, Edward, Lee, Joohyung

arXiv.org Artificial Intelligence

Accurate time prediction of patients' critical events is crucial in urgent scenarios where timely decision-making is important. Though many studies have proposed automatic prediction methods using Electronic Health Records (EHR), their coarse-grained time resolutions limit their practical usage in urgent environments such as the emergency department (ED) and intensive care unit (ICU). Therefore, in this study, we propose an hourly prediction method based on self-supervised predictive coding and multi-modal fusion for two critical tasks: mortality and vasopressor need prediction. Through extensive experiments, we prove significant performance gains from both multi-modal fusion and self-supervised predictive regularization, most notably in far-future prediction, which becomes especially important in practice. Our uni-modal/bi-modal/bi-modal self-supervision scored 0.846/0.877/0.897


Relative Sparsity for Medical Decision Problems

Weisenthal, Samuel J., Thurston, Sally W., Ertefaie, Ashkan

arXiv.org Artificial Intelligence

Existing statistical methods can estimate a policy, or a mapping from covariates to decisions, which can then instruct decision makers (e.g., whether to administer hypotension treatment based on covariates blood pressure and heart rate). There is great interest in using such data-driven policies in healthcare. However, it is often important to explain to the healthcare provider, and to the patient, how a new policy differs from the current standard of care. This end is facilitated if one can pinpoint the aspects of the policy (i.e., the parameters for blood pressure and heart rate) that change when moving from the standard of care to the new, suggested policy. To this end, we adapt ideas from Trust Region Policy Optimization (TRPO). In our work, however, unlike in TRPO, the difference between the suggested policy and standard of care is required to be sparse, aiding with interpretability. This yields ``relative sparsity," where, as a function of a tuning parameter, $\lambda$, we can approximately control the number of parameters in our suggested policy that differ from their counterparts in the standard of care (e.g., heart rate only). We propose a criterion for selecting $\lambda$, perform simulations, and illustrate our method with a real, observational healthcare dataset, deriving a policy that is easy to explain in the context of the current standard of care. Our work promotes the adoption of data-driven decision aids, which have great potential to improve health outcomes.


Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space

Wang, Zeyu, Zhao, Huiying, Ren, Peng, Zhou, Yuxi, Sheng, Ming

arXiv.org Artificial Intelligence

Sepsis is a leading cause of death in the ICU. It is a disease requiring complex interventions in a short period of time, but its optimal treatment strategy remains uncertain. Evidence suggests that the practices of currently used treatment strategies are problematic and may cause harm to patients. To address this decision problem, we propose a new medical decision model based on historical data to help clinicians recommend the best reference option for real-time treatment. Our model combines offline reinforcement learning and deep reinforcement learning to solve the problem of traditional reinforcement learning in the medical field due to the inability to interact with the environment, while enabling our model to make decisions in a continuous state-action space. We demonstrate that, on average, the treatments recommended by the model are more valuable and reliable than those recommended by clinicians. In a large validation dataset, we find out that the patients whose actual doses from clinicians matched the decisions made by AI has the lowest mortality rates. Our model provides personalized and clinically interpretable treatment decisions for sepsis to improve patient care.


Offline reinforcement learning with uncertainty for treatment strategies in sepsis

Liu, Ran, Greenstein, Joseph L., Fackler, James C., Bergmann, Jules, Bembea, Melania M., Winslow, Raimond L.

arXiv.org Artificial Intelligence

Guideline-based treatment for sepsis and septic shock is difficult because sepsis is a disparate range of life-threatening organ dysfunctions whose pathophysiology is not fully understood. Early intervention in sepsis is crucial for patient outcome, yet those interventions have adverse effects and are frequently overadministered. Greater personalization is necessary, as no single action is suitable for all patients. We present a novel application of reinforcement learning in which we identify optimal recommendations for sepsis treatment from data, estimate their confidence level, and identify treatment options infrequently observed in training data. Rather than a single recommendation, our method can present several treatment options. We examine learned policies and discover that reinforcement learning is biased against aggressive intervention due to the confounding relationship between mortality and level of treatment received. We mitigate this bias using subspace learning, and develop methodology that can yield more accurate learning policies across healthcare applications.


Unifying Cardiovascular Modelling with Deep Reinforcement Learning for Uncertainty Aware Control of Sepsis Treatment

Nanayakkara, Thesath, Clermont, Gilles, Langmead, Christopher James, Swigon, David

arXiv.org Artificial Intelligence

Sepsis is the leading cause of mortality in the ICU, responsible for 6% of all hospitalizations and 35% of all in-hospital deaths in USA. However, there is no universally agreed upon strategy for vasopressor and fluid administration. It has also been observed that different patients respond differently to treatment, highlighting the need for individualized treatment. Vasopressors and fluids are administrated with specific effects to cardiovascular physiology in mind and medical research has suggested that physiologic, hemodynamically guided, approaches to treatment. Thus we propose a novel approach, exploiting and unifying complementary strengths of Mathematical Modelling, Deep Learning, Reinforcement Learning and Uncertainty Quantification, to learn individualized, safe, and uncertainty aware treatment strategies. We first infer patient-specific, dynamic cardiovascular states using a novel physiology-driven recurrent neural network trained in an unsupervised manner. This information, along with a learned low dimensional representation of the patient's lab history and observable data, is then used to derive value distributions using Batch Distributional Reinforcement Learning. Moreover in a safety critical domain it is essential to know what our agent does and does not know, for this we also quantify the model uncertainty associated with each patient state and action, and propose a general framework for uncertainty aware, interpretable treatment policies. This framework can be tweaked easily, to reflect a clinician's own confidence of the framework, and can be easily modified to factor in human expert opinion, whenever it's accessible. Using representative patients and a validation cohort, we show that our method has learned physiologically interpretable generalizable policies.