sofa score
Training and Evaluation of Guideline-Based Medical Reasoning in LLMs
Staniek, Michael, Sokolov, Artem, Riezler, Stefan
Machine learning for early prediction in medicine has recently shown breakthrough performance, however, the focus on improving prediction accuracy has led to a neglect of faithful explanations that are required to gain the trust of medical practitioners. The goal of this paper is to teach LLMs to follow medical consensus guidelines step-by-step in their reasoning and prediction process. Since consensus guidelines are ubiquitous in medicine, instantiations of verbalized medical inference rules to electronic health records provide data for fine-tuning LLMs to learn consensus rules and possible exceptions thereof for many medical areas. Consensus rules also enable an automatic evaluation of the model's inference process regarding its derivation correctness (evaluating correct and faithful deduction of a conclusion from given premises) and value correctness (comparing predicted values against real-world measurements). We exemplify our work using the complex Sepsis-3 consensus definition. Our experiments show that small fine-tuned models outperform one-shot learning of considerably larger LLMs that are prompted with the explicit definition and models that are trained on medical texts including consensus definitions. Since fine-tuning on verbalized rule instantiations of a specific medical area yields nearly perfect derivation correctness for rules (and exceptions) on unseen patient data in that area, the bottleneck for early prediction is not out-of-distribution generalization, but the orthogonal problem of generalization into the future by forecasting sparsely and irregularly sampled clinical variables. We show that the latter results can be improved by integrating the output representations of a time series forecasting model with the LLM in a multimodal setup.
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Pacific Ocean > North Pacific Ocean > Gulf of Thailand (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (7 more...)
Improving ARDS Diagnosis Through Context-Aware Concept Bottleneck Models
Narain, Anish, Majumdar, Ritam, Narayanan, Nikita, Marshall, Dominic, Parbhoo, Sonali
Large, publicly available clinical datasets have emerged as a novel resource for understanding disease heterogeneity and to explore personalization of therapy. These datasets are derived from data not originally collected for research purposes and, as a result, are often incomplete and lack critical labels. Many AI tools have been developed to retrospectively label these datasets, such as by performing disease classification; however, they often suffer from limited interpretability. Previous work has attempted to explain predictions using Concept Bottleneck Models (CBMs), which learn interpretable concepts that map to higher-level clinical ideas, facilitating human evaluation. However, these models often experience performance limitations when the concepts fail to adequately explain or characterize the task. We use the identification of Acute Respiratory Distress Syndrome (ARDS) as a challenging test case to demonstrate the value of incorporating contextual information from clinical notes to improve CBM performance. Our approach leverages a Large Language Model (LLM) to process clinical notes and generate additional concepts, resulting in a 10% performance gain over existing methods.
Interpretable Machine Learning for Resource Allocation with Application to Ventilator Triage
Grand-Clément, Julien, Goh, You Hui, Chan, Carri, Goyal, Vineet, Chuang, Elizabeth
Rationing of healthcare resources is a challenging decision that policy makers and providers may be forced to make during a pandemic, natural disaster, or mass casualty event. Well-defined guidelines to triage scarce life-saving resources must be designed to promote transparency, trust, and consistency. To facilitate buy-in and use during high-stress situations, these guidelines need to be interpretable and operational. We propose a novel data-driven model to compute interpretable triage guidelines based on policies for Markov Decision Process that can be represented as simple sequences of decision trees ("tree policies"). In particular, we characterize the properties of optimal tree policies and present an algorithm based on dynamic programming recursions to compute good tree policies. We utilize this methodology to obtain simple, novel triage guidelines for ventilator allocations for COVID-19 patients, based on real patient data from Montefiore hospitals. We also compare the performance of our guidelines to the official New York State guidelines that were developed in 2015 (well before the COVID-19 pandemic). Our empirical study shows that the number of excess deaths associated with ventilator shortages could be reduced significantly using our policy. Our work highlights the limitations of the existing official triage guidelines, which need to be adapted specifically to COVID-19 before being successfully deployed.
- Europe > Italy (0.04)
- North America > United States > New York > Bronx County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (10 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.92)
Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting
Staniek, Michael, Fracarolli, Marius, Hagmann, Michael, Riezler, Stefan
Machine learning for early syndrome diagnosis aims to solve the intricate task of predicting a ground truth label that most often is the outcome (effect) of a medical consensus definition applied to observed clinical measurements (causes), given clinical measurements observed several hours before. Instead of focusing on the prediction of the future effect, we propose to directly predict the causes via time series forecasting (TSF) of clinical variables and determine the effect by applying the gold standard consensus definition to the forecasted values. This method has the invaluable advantage of being straightforwardly interpretable to clinical practitioners, and because model training does not rely on a particular label anymore, the forecasted data can be used to predict any consensus-based label. We exemplify our method by means of long-term TSF with Transformer models, with a focus on accurate prediction of sparse clinical variables involved in the SOFA-based Sepsis-3 definition and the new Simplified Acute Physiology Score (SAPS-II) definition. Our experiments are conducted on two datasets and show that contrary to recent proposals which advocate set function encoders for time series and direct multi-step decoders, best results are achieved by a combination of standard dense encoders with iterative multi-step decoders. The key for success of iterative multi-step decoding can be attributed to its ability to capture cross-variate dependencies and to a student forcing training strategy that teaches the model to rely on its own previous time step predictions for the next time step prediction.
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Germany (0.04)
- (11 more...)
Zero Shot Health Trajectory Prediction Using Transformer
Renc, Pawel, Jia, Yugang, Samir, Anthony E., Was, Jaroslaw, Li, Quanzheng, Bates, David W., Sitek, Arkadiusz
Integrating modern machine learning and clinical decision-making has great promise for mitigating healthcare's increasing cost and complexity. We introduce the Enhanced Transformer for Health Outcome Simulation (ETHOS), a novel application of the transformer deep-learning architecture for analyzing high-dimensional, heterogeneous, and episodic health data. ETHOS is trained using Patient Health Timelines (PHTs)-detailed, tokenized records of health events-to predict future health trajectories, leveraging a zero-shot learning approach. ETHOS represents a significant advancement in foundation model development for healthcare analytics, eliminating the need for labeled data and model fine-tuning. Its ability to simulate various treatment pathways and consider patient-specific factors positions ETHOS as a tool for care optimization and addressing biases in healthcare delivery. Future developments will expand ETHOS' capabilities to incorporate a wider range of data types and data sources. Our work demonstrates a pathway toward accelerated AI development and deployment in healthcare.
- Oceania > New Zealand (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)
- Overview > Innovation (0.66)
- Health & Medicine > Consumer Health (1.00)
- Health & Medicine > Health Care Technology > Medical Record (0.93)
- Health & Medicine > Therapeutic Area > Endocrinology (0.92)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)
Individualized Multi-Treatment Response Curves Estimation using RBF-net with Shared Neurons
Estimation of heterogeneous treatment effects from observational data has become an important problem. It plays a crucial role in determining the individualized causal effects of a treatment, which then leads to a personalized assignment of optimal treatment (Wendling et al., 2018; Rekkas et al., 2020). Estimation of such heterogeneity however requires reasonable representations from each treatment subgroup. With the increasing availability of large-scale health outcome data such as electronic health records (EHR) data in recent years, it has become possible to develop individualized treatment strategies efficiently. This led to the development of several novel statistical methods, primarily tailored for binary treatment scenarios (Wendling et al., 2018; Cheng et al., 2020), with some accommodating multiple treatment settings (Brown et al., 2020; Chalkou et al., 2021). Most of these approaches are specifically designed for estimating population average treatment effects (ATEs) (Van Der Laan and Rubin, 2006; Chernozhukov et al., 2018; McCaffrey et al., 2013) and more recently, methods are being developed to estimate conditional average treatment effects (CATEs) (Taddy et al., 2016; Wager and Athey, 2018; Künzel et al., 2019; Nie and Wager, 2021). Here, we tackle a generic problem of heterogeneous treatment effect or CATE estimation in a multi-treatment setting, where the treatment responses may share some commonalities.
- Europe > Middle East > Malta > Northern Region > Western District > Attard (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Florida (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Data Science > Data Mining (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Cluster trajectory of SOFA score in predicting mortality in sepsis
Ke, Yuhe, Tang, Matilda Swee Sun, Loh, Celestine Jia Ling, Abdullah, Hairil Rizal, Shannon, Nicholas Brian
Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes. Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time warping and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python. Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken. Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward. Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Singapore (0.05)
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- (15 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
Learning and DiSentangling Patient Static Information from Time-series Electronic HEalth Record (STEER)
Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. For example, previous work has shown that patient self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive performance can be extended to a wide range of comorbidity factors and exists even when the model was trained for different tasks, using different cohorts, using different model architectures and databases. Given the privacy and fairness concerns these findings pose, we develop a variational autoencoder-based approach that learns a structured latent space to disentangle patient-sensitive attributes from time-series data. Our work thoroughly investigates the ability of machine learning models to encode patient static information from time-series electronic health records and introduces a general approach to protect patient-sensitive attribute information for downstream tasks.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Mongolia (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
Validity problems in clinical machine learning by indirect data labeling using consensus definitions
Hagmann, Michael, Schamoni, Shigehiko, Riezler, Stefan
We demonstrate a validity problem of machine learning in the vital application area of disease diagnosis in medicine. It arises when target labels in training data are determined by an indirect measurement, and the fundamental measurements needed to determine this indirect measurement are included in the input data representation. Machine learning models trained on this data will learn nothing else but to exactly reconstruct the known target definition. Such models show perfect performance on similarly constructed test data but will fail catastrophically on real-world examples where the defining fundamental measurements are not or only incompletely available. We present a general procedure allowing identification of problematic datasets and black-box machine learning models trained on them, and exemplify our detection procedure on the task of early prediction of sepsis.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Germany (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
Identifying acute illness phenotypes via deep temporal interpolation and clustering network on physiologic signatures
Ren, Yuanfang, Li, Yanjun, Loftus, Tyler J., Balch, Jeremy, Abbott, Kenneth L., Datta, Shounak, Ruppert, Matthew M., Guan, Ziyuan, Shickel, Benjamin, Rashidi, Parisa, Ozrazgat-Baslanti, Tezcan, Bihorac, Azra
Initial hours of hospital admission impact clinical trajectory, but early clinical decisions often suffer due to data paucity. With clustering analysis for vital signs within six hours of admission, patient phenotypes with distinct pathophysiological signatures and outcomes may support early clinical decisions. We created a single-center, longitudinal EHR dataset for 75,762 adults admitted to a tertiary care center for 6+ hours. We proposed a deep temporal interpolation and clustering network to extract latent representations from sparse, irregularly sampled vital sign data and derived distinct patient phenotypes in a training cohort (n=41,502). Model and hyper-parameters were chosen based on a validation cohort (n=17,415). Test cohort (n=16,845) was used to analyze reproducibility and correlation with biomarkers. The training, validation, and testing cohorts had similar distributions of age (54-55 yrs), sex (55% female), race, comorbidities, and illness severity. Four clusters were identified. Phenotype A (18%) had most comorbid disease with higher rate of prolonged respiratory insufficiency, acute kidney injury, sepsis, and three-year mortality. Phenotypes B (33%) and C (31%) had diffuse patterns of mild organ dysfunction. Phenotype B had favorable short-term outcomes but second-highest three-year mortality. Phenotype C had favorable clinical outcomes. Phenotype D (17%) had early/persistent hypotension, high rate of early surgery, and substantial biomarker rate of inflammation but second-lowest three-year mortality. After comparing phenotypes' SOFA scores, clustering results did not simply repeat other acuity assessments. In a heterogeneous cohort, four phenotypes with distinct categories of disease and outcomes were identified by a deep temporal interpolation and clustering network. This tool may impact triage decisions and clinical decision-support under time constraints.
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- Europe > United Kingdom > England (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
- Information Technology > Artificial Intelligence > Natural Language (0.67)