mimic-iii
Appendix A Proofs of Formal Claims
By pre-training the model on domain-specific data, PubMED BERT is expected to have a better understanding of biomedical concepts, terminology, and language patterns compared to general domain models like BERT -base and BERT -large [ 95 ]. The main advantage of using PubMED BERT for biomedical text mining tasks is its domain-specific knowledge, which can lead to improved performance and more accurate results when fine-tuned on various downstream tasks, such as named entity recognition, relation extraction, document classification, and question answering. Since PubMED BERT is pre-trained on a large corpus of biomedical text, it is better suited to capturing the unique language patterns, complex terminology, and the relationships between entities in the biomedical domain.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Asia > Middle East > Israel (0.04)
- Health & Medicine > Health Care Providers & Services (0.94)
- Health & Medicine > Therapeutic Area (0.71)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
Knowledge Graph Augmented Large Language Models for Disease Prediction
Wang, Ruiyu, Vinh, Tuan, Xu, Ran, Zhou, Yuyin, Lu, Jiaying, Yang, Carl, Pasquel, Francisco
Electronic health records (EHRs) support powerful clinical prediction models, but existing methods typically provide coarse, post hoc explanations that offer limited value for patient-level decision making. We introduce a knowledge graph (KG)-guided chain-of-thought (CoT) framework that generates clinically grounded and temporally consistent reasoning for visit-level disease prediction in MIMIC-III. ICD-9 codes are mapped to PrimeKG, from which disease-relevant nodes and multi-hop reasoning paths are extracted and used as scaffolds for CoT generation; only explanations whose conclusions match observed outcomes are retained. Lightweight LLaMA-3.1-Instruct-8B and Gemma-7B models are then fine-tuned on this supervision corpus. Across ten PrimeKG-mapped diseases and limited training cohorts (400 and 1000 cases), KG-guided models outperform strong classical baselines, achieving AUROC values of 0.66 to 0.70 and macro-AUPR values of 0.40 to 0.47. The models also transfer zero-shot to the CRADLE cohort, improving accuracy from approximately 0.40 to 0.51 up to 0.72 to 0.77. A blinded clinician evaluation shows consistent preference for KG-guided CoT explanations in clarity, relevance, and clinical correctness.
- North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring
Kim, Changhun, Mun, Yechan, Jang, Hyeongwon, Lee, Eunseo, Hahn, Sangchul, Yang, Eunho
Explaining online time series monitoring models is crucial across sensitive domains such as healthcare and finance, where temporal and contextual prediction dynamics underpin critical decisions. While recent XAI methods have improved the explainability of time series models, they mostly analyze each time step independently, overlooking temporal dependencies. This results in further challenges: explaining prediction changes is non-trivial, methods fail to leverage online dynamics, and evaluation remains difficult. To address these challenges, we propose Delta-XAI, which adapts 14 existing XAI methods through a wrapper function and introduces a principled evaluation suite for the online setting, assessing diverse aspects, such as faithfulness, sufficiency, and coherence. Experiments reveal that classical gradient-based methods, such as Integrated Gradients (IG), can outperform recent approaches when adapted for temporal analysis. Building on this, we propose Shifted Window Integrated Gradients (SWING), which incorporates past observations in the integration path to systematically capture temporal dependencies and mitigate out-of-distribution effects. Extensive experiments consistently demonstrate the effectiveness of SWING across diverse settings with respect to diverse metrics. Our code is publicly available at https://anonymous.4open.science/r/Delta-XAI.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Data Science > Data Mining (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
REACT-LLM: A Benchmark for Evaluating LLM Integration with Causal Features in Clinical Prognostic Tasks
Wang, Linna, You, Zhixuan, Zhang, Qihui, Wen, Jiunan, Shi, Ji, Chen, Yimin, Wang, Yusen, Ding, Fanqi, Feng, Ziliang, Lu, Li
Large Language Models (LLMs) and causal learning each hold strong potential for clinical decision making (CDM). However, their synergy remains poorly understood, largely due to the lack of systematic benchmarks evaluating their integration in clinical risk prediction. In real-world healthcare, identifying features with causal influence on outcomes is crucial for actionable and trustworthy predictions. While recent work highlights LLMs' emerging causal reasoning abilities, there lacks comprehensive benchmarks to assess their causal learning and performance informed by causal features in clinical risk prediction. To address this, we introduce REACT-LLM, a benchmark designed to evaluate whether combining LLMs with causal features can enhance clinical prognostic performance and potentially outperform traditional machine learning (ML) methods. Unlike existing LLM-clinical benchmarks that often focus on a limited set of outcomes, REACT-LLM evaluates 7 clinical outcomes across 2 real-world datasets, comparing 15 prominent LLMs, 6 traditional ML models, and 3 causal discovery (CD) algorithms. Our findings indicate that while LLMs perform reasonably in clinical prognostics, they have not yet outperformed traditional ML models. Integrating causal features derived from CD algorithms into LLMs offers limited performance gains, primarily due to the strict assumptions of many CD methods, which are often violated in complex clinical data. While the direct integration yields limited improvement, our benchmark reveals a more promising synergy.
- Asia > Middle East > Israel (0.04)
- Asia > China > Yunnan Province > Kunming (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting
Fracarolli, Marius, Staniek, Michael, Riezler, Stefan
Balancing strong privacy guarantees with high predictive performance is critical for time series forecasting (TSF) tasks involving Electronic Health Records (EHR). In this study, we explore how data augmentation can mitigate Membership Inference Attacks (MIA) on TSF models. We show that retraining with synthetic data can substantially reduce the effectiveness of loss-based MIAs by reducing the attacker's true-positive to false-positive ratio. The key challenge is generating synthetic samples that closely resemble the original training data to confuse the attacker, while also introducing enough novelty to enhance the model's ability to generalize to unseen data. We examine multiple augmentation strategies -- Zeroth-Order Optimization (ZOO), a variant of ZOO constrained by Principal Component Analysis (ZOO-PCA), and MixUp -- to strengthen model resilience without sacrificing accuracy. Our experimental results show that ZOO-PCA yields the best reductions in TPR/FPR ratio for MIA attacks without sacrificing performance on test data.
- Europe > Austria > Vienna (0.14)
- Europe > Germany (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- (6 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.94)
Improving Clinical Dataset Condensation with Mode Connectivity-based Trajectory Surrogates
Nganjimi, Pafue Christy, Soltan, Andrew, Belgrave, Danielle, Clifton, Lei, Clifton, David A., Thakur, Anshul
Dataset condensation (DC) enables the creation of compact, privacy-preserving synthetic datasets that can match the utility of real patient records, supporting democratised access to highly regulated clinical data for developing downstream clinical models. State-of-the-art DC methods supervise synthetic data by aligning the training dynamics of models trained on real and those trained on synthetic data, typically using full stochastic gradient descent (SGD) trajectories as alignment targets; however, these trajectories are often noisy, high-curvature, and storage-intensive, leading to unstable gradients, slow convergence, and substantial memory overhead. We address these limitations by replacing full SGD trajectories with smooth, low-loss parametric surrogates, specifically quadratic Bézier curves that connect the initial and final model states from real training trajectories. These mode-connected paths provide noise-free, low-curvature supervision signals that stabilise gradients, accelerate convergence, and eliminate the need for dense trajectory storage. We theoretically justify Bézier-mode connections as effective surrogates for SGD paths and empirically show that the proposed method outperforms state-of-the-art condensation approaches across five clinical datasets, yielding condensed datasets that enable clinically effective model development.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > China (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
Appendix A Proofs of Formal Claims
By pre-training the model on domain-specific data, PubMED BERT is expected to have a better understanding of biomedical concepts, terminology, and language patterns compared to general domain models like BERT -base and BERT -large [ 95 ]. The main advantage of using PubMED BERT for biomedical text mining tasks is its domain-specific knowledge, which can lead to improved performance and more accurate results when fine-tuned on various downstream tasks, such as named entity recognition, relation extraction, document classification, and question answering. Since PubMED BERT is pre-trained on a large corpus of biomedical text, it is better suited to capturing the unique language patterns, complex terminology, and the relationships between entities in the biomedical domain.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Asia > Middle East > Israel (0.04)
- Health & Medicine > Health Care Providers & Services (0.94)
- Health & Medicine > Therapeutic Area (0.71)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases
Lee, Kwanhyung, Hong, Sungsoo, Park, Joonhyung, Lim, Jeonghyeop, Choi, Juhwan, Yoon, Donghwee, Yang, Eunho
Machine learning models for clinical prediction rely on structured data extracted from Electronic Medical Records (EMRs), yet this process remains dominated by hardcoded, database-specific pipelines for cohort definition, feature selection, and code mapping. These manual efforts limit scalability, reproducibility, and cross-institutional generalization. To address this, we introduce EMR-AGENT (Automated Generalized Extraction and Navigation Tool), an agent-based framework that replaces manual rule writing with dynamic, language model-driven interaction to extract and standardize structured clinical data. Our framework automates cohort selection, feature extraction, and code mapping through interactive querying of databases. Our modular agents iteratively observe query results and reason over schema and documentation, using SQL not just for data retrieval but also as a tool for database observation and decision making. This eliminates the need for hand-crafted, schema-specific logic. To enable rigorous evaluation, we develop a benchmarking codebase for three EMR databases (MIMIC-III, eICU, SICdb), including both seen and unseen schema settings. Our results demonstrate strong performance and generalization across these databases, highlighting the feasibility of automating a process previously thought to require expert-driven design. The code will be released publicly at https://github.com/AITRICS/EMR-AGENT/tree/main. For a demonstration, please visit our anonymous demo page: https://anonymoususer-max600.github.io/EMR_AGENT/
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Health Care Technology > Medical Record (1.00)
- Health & Medicine > Diagnostic Medicine (0.88)