Goto

Collaborating Authors

 delirium


Adverse Event Extraction from Discharge Summaries: A New Dataset, Annotation Scheme, and Initial Findings

Guellil, Imane, Andres, Salomé, Anand, Atul, Guthrie, Bruce, Zhang, Huayu, Hasan, Abul, Wu, Honghan, Alex, Beatrice

arXiv.org Artificial Intelligence

In this work, we present a manually annotated corpus for Adverse Event (AE) extraction from discharge summaries of elderly patients, a population often underrepresented in clinical NLP resources. The dataset includes 14 clinically significant AEs-such as falls, delirium, and intracranial haemorrhage, along with contextual attributes like negation, diagnosis type, and in-hospital occurrence. Uniquely, the annotation schema supports both discontinuous and overlapping entities, addressing challenges rarely tackled in prior work. We evaluate multiple models using FlairNLP across three annotation granularities: fine-grained, coarse-grained, and coarse-grained with negation. While transformer-based models (e.g., BERT-cased) achieve strong performance on document-level coarse-grained extraction (F1 = 0.943), performance drops notably for fine-grained entity-level tasks (e.g., F1 = 0.675), particularly for rare events and complex attributes. These results demonstrate that despite high-level scores, significant challenges remain in detecting underrepresented AEs and capturing nuanced clinical language. Developed within a Trusted Research Environment (TRE), the dataset is available upon request via DataLoch and serves as a robust benchmark for evaluating AE extraction methods and supporting future cross-dataset generalisation.


Transformer representation learning is necessary for dynamic multi-modal physiological data on small-cohort patients

Wang, Bingxu, Ge, Min, Cai, Kunzhi, Zhang, Yuqi, Zhou, Zeyi, Li, Wenjiao, Guo, Yachong, Wang, Wei, Zhou, Qing

arXiv.org Artificial Intelligence

Transformer representation learning is necessary for dynamic multi-modal physiological data on small-cohort patients Bingxu Wang, Min Ge, Kunzhi Cai, Yuqi Zhang, Zeyi Zhou, Wenjiao Li, Yachong Guo,, Wei Wang,, and Qing Zhou, Department of Thoracic and Cardiovascular Surgery, The Affiliated Drum Tower Hospital of Nanjing University Medical School, Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China National Laboratory of Solid State Microstructure, Department of Physics, Nanjing University, Nanjing 210093, China E-mail: yguo@nju.edu.cn; Abstract Postoperative delirium (POD), a severe neuropsychiatric complication affecting nearly 50% of high-risk surgical patients, is defined as an acute disorder of attention and cognition, It remains significantly underdiagnosed in the intensive care units (ICUs) due to subjective monitoring methods. Early and accurate diagnosis of POD is critical and achievable. Here, we propose a POD prediction framework comprising a Transformer representation model followed by traditional machine learning algorithms. We curated the first multi-modal POD dataset encompass-1 ing two patient types and evaluated the various Transformer architectures for representation learning. Empirical results indicate a consistent improvements of sensitivity and Youden index in patient TYPE I using Transformer representations, particularly our fusion adaptation of Pathformer. By enabling effective delirium diagnosis from postoperative day 1 to 3, our extensive experimental findings emphasize the potential of multi-modal physiological data and highlight the necessity of representation learning via multi-modal Transformer architecture in clinical diagnosis. Introduction Postoperative delirium(POD), a prevalent acute neuropsychiatric syndrome 1,2, affects more than 50% of surgical patients and significantly elevates morbidity and mortality risks 3 . Early identification is crucial yet challenging 4, primarily due to subjective assessment criteria and incomplete understanding of underlying pathophysiological mechanisms 5 .


Prediction of Delirium Risk in Mild Cognitive Impairment Using Time-Series data, Machine Learning and Comorbidity Patterns -- A Retrospective Study

Ramamoorthy, Santhakumar, Rani, Priya, Mahon, James, Mathews, Glenn, Cloherty, Shaun, Babaei, Mahdi

arXiv.org Artificial Intelligence

Delirium represents a significant clinical concern characterized by high morbidity and mortality rates, particularly in patients with mild cognitive impairment (MCI). This study investigates the associated risk factors for delirium by analyzing the comorbidity patterns relevant to MCI and developing a longitudinal predictive model leveraging machine learning methodologies. A retrospective analysis utilizing the MIMIC-IV v2.2 database was performed to evaluate comorbid conditions, survival probabilities, and predictive modeling outcomes. The examination of comorbidity patterns identified distinct risk profiles for the MCI population. Kaplan-Meier survival analysis demonstrated that individuals with MCI exhibit markedly reduced survival probabilities when developing delirium compared to their non-MCI counterparts, underscoring the heightened vulnerability within this cohort. For predictive modeling, a Long Short-Term Memory (LSTM) ML network was implemented utilizing time-series data, demographic variables, Charlson Comorbidity Index (CCI) scores, and an array of comorbid conditions. The model demonstrated robust predictive capabilities with an AUROC of 0.93 and an AUPRC of 0.92. This study underscores the critical role of comorbidities in evaluating delirium risk and highlights the efficacy of time-series predictive modeling in pinpointing patients at elevated risk for delirium development.


Quantifying Circadian Desynchrony in ICU Patients and Its Association with Delirium

Ren, Yuanfang, Davidson, Andrea E., Zhang, Jiaqing, Contreras, Miguel, Patel, Ayush K., Gumz, Michelle, Ozrazgat-Baslanti, Tezcan, Rashidi, Parisa, Bihorac, Azra

arXiv.org Artificial Intelligence

Background: Circadian desynchrony characterized by the misalignment between an individual's internal biological rhythms and external environmental cues, significantly affects various physiological processes and health outcomes. Quantifying circadian desynchrony often requires prolonged and frequent monitoring, and currently, an easy tool for this purpose is missing. Additionally, its association with the incidence of delirium has not been clearly explored. Methods: A prospective observational study was carried out in intensive care units (ICU) of a tertiary hospital. Circadian transcriptomics of blood monocytes from 86 individuals were collected on two consecutive days, although a second sample could not be obtained from all participants. Using two public datasets comprised of healthy volunteers, we replicated a model for determining internal circadian time. We developed an approach to quantify circadian desynchrony by comparing internal circadian time and external blood collection time. We applied the model and quantified circadian desynchrony index among ICU patients, and investigated its association with the incidence of delirium. Results: The replicated model for determining internal circadian time achieved comparable high accuracy. The quantified circadian desynchrony index was significantly higher among critically ill ICU patients compared to healthy subjects, with values of 10.03 hours vs 2.50-2.95 hours (p < 0.001). Most ICU patients had a circadian desynchrony index greater than 9 hours. Additionally, the index was lower in patients whose blood samples were drawn after 3pm, with values of 5.00 hours compared to 10.01-10.90 hours in other groups (p < 0.001)...


MELON: Multimodal Mixture-of-Experts with Spectral-Temporal Fusion for Long-Term Mobility Estimation in Critical Care

Zhang, Jiaqing, Contreras, Miguel, Sena, Jessica, Davidson, Andrea, Ren, Yuanfang, Guan, Ziyuan, Ozrazgat-Baslanti, Tezcan, Loftus, Tyler J., Nerella, Subhash, Bihorac, Azra, Rashidi, Parisa

arXiv.org Artificial Intelligence

Patient mobility monitoring in intensive care is critical for ensuring timely interventions and improving clinical outcomes. While accelerometry-based sensor data are widely adopted in training artificial intelligence models to estimate patient mobility, existing approaches face two key limitations highlighted in clinical practice: (1) modeling the long-term accelerometer data is challenging due to the high dimensionality, variability, and noise, and (2) the absence of efficient and robust methods for long-term mobility assessment. To overcome these challenges, we introduce MELON, a novel multimodal framework designed to predict 12-hour mobility status in the critical care setting. MELON leverages the power of a dual-branch network architecture, combining the strengths of spectrogram-based visual representations and sequential accelerometer statistical features. MELON effectively captures global and fine-grained mobility patterns by integrating a pre-trained image encoder for rich frequency-domain feature extraction and a Mixture-of-Experts encoder for sequence modeling. We trained and evaluated the MELON model on the multimodal dataset of 126 patients recruited from nine Intensive Care Units at the University of Florida Health Shands Hospital main campus in Gainesville, Florida. Experiments showed that MELON outperforms conventional approaches for 12-hour mobility status estimation with an overall area under the receiver operating characteristic curve (AUROC) of 0.82 (95\%, confidence interval 0.78-0.86). Notably, our experiments also revealed that accelerometer data collected from the wrist provides robust predictive performance compared with data from the ankle, suggesting a single-sensor solution that can reduce patient burden and lower deployment costs...


MANDARIN: Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients: Development and Validation of an Acute Brain Dysfunction Prediction Model

Contreras, Miguel, Sena, Jessica, Davidson, Andrea, Zhang, Jiaqing, Ozrazgat-Baslanti, Tezcan, Ren, Yuanfang, Guan, Ziyuan, Balch, Jeremy, Loftus, Tyler, Nerella, Subhash, Bihorac, Azra, Rashidi, Parisa

arXiv.org Artificial Intelligence

Acute brain dysfunction (ABD) is a common, severe ICU complication, presenting as delirium or coma and leading to prolonged stays, increased mortality, and cognitive decline. Traditional screening tools like the Glasgow Coma Scale (GCS), Confusion Assessment Method (CAM), and Richmond Agitation-Sedation Scale (RASS) rely on intermittent assessments, causing delays and inconsistencies. In this study, we propose MANDARIN (Mixture-of-Experts Framework for Dynamic Delirium and Coma Prediction in ICU Patients), a 1.5M-parameter mixture-of-experts neural network to predict ABD in real-time among ICU patients. The model integrates temporal and static data from the ICU to predict the brain status in the next 12 to 72 hours, using a multi-branch approach to account for current brain status. The MANDARIN model was trained on data from 92,734 patients (132,997 ICU admissions) from 2 hospitals between 2008-2019 and validated externally on data from 11,719 patients (14,519 ICU admissions) from 15 hospitals and prospectively on data from 304 patients (503 ICU admissions) from one hospital in 2021-2024. Three datasets were used: the University of Florida Health (UFH) dataset, the electronic ICU Collaborative Research Database (eICU), and the Medical Information Mart for Intensive Care (MIMIC)-IV dataset. MANDARIN significantly outperforms the baseline neurological assessment scores (GCS, CAM, and RASS) for delirium prediction in both external (AUROC 75.5% CI: 74.2%-76.8% vs 68.3% CI: 66.9%-69.5%) and prospective (AUROC 82.0% CI: 74.8%-89.2% vs 72.7% CI: 65.5%-81.0%) cohorts, as well as for coma prediction (external AUROC 87.3% CI: 85.9%-89.0% vs 72.8% CI: 70.6%-74.9%, and prospective AUROC 93.4% CI: 88.5%-97.9% vs 67.7% CI: 57.7%-76.8%) with a 12-hour lead time. This tool has the potential to assist clinicians in decision-making by continuously monitoring the brain status of patients in the ICU.


DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR

Contreras, Miguel, Kapoor, Sumit, Zhang, Jiaqing, Davidson, Andrea, Ren, Yuanfang, Guan, Ziyuan, Ozrazgat-Baslanti, Tezcan, Nerella, Subhash, Bihorac, Azra, Rashidi, Parisa

arXiv.org Artificial Intelligence

Delirium is an acute confusional state that has been shown to affect up to 31% of patients in the intensive care unit (ICU). Early detection of this condition could lead to more timely interventions and improved health outcomes. While artificial intelligence (AI) models have shown great potential for ICU delirium prediction using structured electronic health records (EHR), most of them have not explored the use of state-of-the-art AI models, have been limited to single hospitals, or have been developed and validated on small cohorts. The use of large language models (LLM), models with hundreds of millions to billions of parameters, with structured EHR data could potentially lead to improved predictive performance. In this study, we propose DeLLiriuM, a novel LLM-based delirium prediction model using EHR data available in the first 24 hours of ICU admission to predict the probability of a patient developing delirium during the rest of their ICU admission. We develop and validate DeLLiriuM on ICU admissions from 104,303 patients pertaining to 195 hospitals across three large databases: the eICU Collaborative Research Database, the Medical Information Mart for Intensive Care (MIMIC)-IV, and the University of Florida Health's Integrated Data Repository. The performance measured by the area under the receiver operating characteristic curve (AUROC) showed that DeLLiriuM outperformed all baselines in two external validation sets, with 0.77 (95% confidence interval 0.76-0.78) and 0.84 (95% confidence interval 0.83-0.85) across 77,543 patients spanning 194 hospitals. To the best of our knowledge, DeLLiriuM is the first LLM-based delirium prediction tool for the ICU based on structured EHR data, outperforming deep learning baselines which employ structured features and can provide helpful information to clinicians for timely interventions.


LLMs for clinical risk prediction

Rezk, Mohamed, Silva, Patricia Cabanillas, Dahlweid, Fried-Michael

arXiv.org Artificial Intelligence

This study compares the efficacy of GPT-4 and clinalytix Medical AI in predicting the clinical risk of delirium development. Findings indicate that GPT-4 exhibited significant deficiencies in identifying positive cases and struggled to provide reliable probability estimates for delirium risk, while clinalytix Medical AI demonstrated superior accuracy. A thorough analysis of the large language model's (LLM) outputs elucidated potential causes for these discrepancies, consistent with limitations reported in extant literature. These results underscore the challenges LLMs face in accurately diagnosing conditions and interpreting complex clinical data. While LLMs hold substantial potential in healthcare, they are currently unsuitable for independent clinical decision-making. Instead, they should be employed in assistive roles, complementing clinical expertise. Continued human oversight remains essential to ensure optimal outcomes for both patients and healthcare providers.


Leveraging Computer Vision in the Intensive Care Unit (ICU) for Examining Visitation and Mobility

Siegel, Scott, Zhang, Jiaqing, Bandyopadhyay, Sabyasachi, Nerella, Subhash, Silva, Brandon, Baslanti, Tezcan, Bihorac, Azra, Rashidi, Parisa

arXiv.org Artificial Intelligence

Despite the importance of closely monitoring patients in the Intensive Care Unit (ICU), many aspects are still assessed in a limited manner due to the time constraints imposed on healthcare providers. For example, although excessive visitations during rest hours can potentially exacerbate the risk of circadian rhythm disruption and delirium, it is not captured in the ICU. Likewise, while mobility can be an important indicator of recovery or deterioration in ICU patients, it is only captured sporadically or not captured at all. In the past few years, the computer vision field has found application in many domains by reducing the human burden. Using computer vision systems in the ICU can also potentially enable non-existing assessments or enhance the frequency and accuracy of existing assessments while reducing the staff workload. In this study, we leverage a state-of-the-art noninvasive computer vision system based on depth imaging to characterize ICU visitations and patients' mobility. We then examine the relationship between visitation and several patient outcomes, such as pain, acuity, and delirium. We found an association between deteriorating patient acuity and the incidence of delirium with increased visitations. In contrast, self-reported pain, reported using the Defense and Veteran Pain Rating Scale (DVPRS), was correlated with decreased visitations. Our findings highlight the feasibility and potential of using noninvasive autonomous systems to monitor ICU patients.


Clustering of Disease Trajectories with Explainable Machine Learning: A Case Study on Postoperative Delirium Phenotypes

Zheng, Xiaochen, Schürch, Manuel, Chen, Xingyu, Komninou, Maria Angeliki, Schüpbach, Reto, Allam, Ahmed, Bartussek, Jan, Krauthammer, Michael

arXiv.org Artificial Intelligence

The identification of phenotypes within complex diseases or syndromes is a fundamental component of precision medicine, which aims to adapt healthcare to individual patient characteristics. Postoperative delirium (POD) is a complex neuropsychiatric condition with significant heterogeneity in its clinical manifestations and underlying pathophysiology. We hypothesize that POD comprises several distinct phenotypes, which cannot be directly observed in clinical practice. Identifying these phenotypes could enhance our understanding of POD pathogenesis and facilitate the development of targeted prevention and treatment strategies. In this paper, we propose an approach that combines supervised machine learning for personalized POD risk prediction with unsupervised clustering techniques to uncover potential POD phenotypes. We first demonstrate our approach using synthetic data, where we simulate patient cohorts with predefined phenotypes based on distinct sets of informative features. We aim to mimic any clinical disease with our synthetic data generation method. By training a predictive model and applying SHAP, we show that clustering patients in the SHAP feature importance space successfully recovers the true underlying phenotypes, outperforming clustering in the raw feature space. We then present a case study using real-world data from a cohort of elderly surgical patients. The results showcase the utility of our approach in uncovering clinically relevant subtypes of complex disorders like POD, paving the way for more precise and personalized treatment strategies.