Goto

Collaborating Authors

 sdoh


ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes

Zhou, Rongjia, Li, Chengzhuo, Yang, Carl, Lu, Jiaying

arXiv.org Artificial Intelligence

Heart failure (HF) is one of the leading causes of rehospitalization among older adults in the United States. Although clinical notes contain rich, detailed patient information and make up a large portion of electronic health records (EHRs), they remain underutilized for HF readmission risk analysis. Traditional computational models for HF readmission often rely on expert-crafted rules, medical thesauri, and ontologies to interpret clinical notes, which are typically written under time pressure and may contain misspellings, abbreviations, and domain-specific jargon. We present ClinNoteAgents, an LLM-based multi-agent framework that transforms free-text clinical notes into (1) structured representations of clinical and social risk factors for association analysis and (2) clinician-style abstractions for HF 30-day readmission prediction. We evaluate ClinNoteAgents on 3,544 notes from 2,065 patients (readmission rate=35.16%), demonstrating strong performance in extracting risk factors from free-text, identifying key contributing factors, and predicting readmission risk. By reducing reliance on structured fields and minimizing manual annotation and model training, ClinNoteAgents provides a scalable and interpretable approach to note-based HF readmission risk modeling in data-limited healthcare systems.


Integration of Large Language Models and Traditional Deep Learning for Social Determinants of Health Prediction

Landes, Paul, Sun, Jimeng, Cross, Adam

arXiv.org Artificial Intelligence

Social Determinants of Health (SDoH) are economic, social and personal circumstances that affect or influence an individual's health status. SDoHs have shown to be correlated to wellness outcomes, and therefore, are useful to physicians in diagnosing diseases and in decision-making. In this work, we automatically extract SDoHs from clinical text using traditional deep learning and Large Language Models (LLMs) to find the advantages and disadvantages of each on an existing publicly available dataset. Our models outperform a previous reference point on a multilabel SDoH classification by 10 points, and we present a method and model to drastically speed up classification (12X execution time) by eliminating expensive LLM processing. The method we present combines a more nimble and efficient solution that leverages the power of the LLM for precision and traditional deep learning methods for efficiency. We also show highly performant results on a dataset supplemented with synthetic data and several traditional deep learning models that outperform LLMs. Our models and methods offer the next iteration of automatic prediction of SDoHs that impact at-risk patients.


Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of Health

Moukheiber, Mira, Moukheiber, Lama, Moukheiber, Dana, Lee, Hyung-Chul

arXiv.org Artificial Intelligence

Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of Health Mira Moukheiber 1, Lama Moukheiber 1, Dana Moukheiber 1 and Hyung-Chul Lee 2, 1 Massachusetts Institute of Technology 2 Seoul National University College of Medicine, Seoul National University Hospital, Department of Anesthesiology and Pain Medicine vital@snu.ac.kr Abstract In critical care settings, where precise and timely interventions are crucial for health outcomes, evaluating disparities in patient outcomes is important. Current approaches often fall short in comprehensively understanding and evaluating the impact of respiratory support interventions on individuals affected by social determinants of health. Attributes such as gender, race, and age are commonly assessed and essential, but provide only a partial view of the complexities faced by diverse populations. In this study, we focus on two clinically motivated tasks: prolonged mechanical ventilation and successful weaning. We also perform fairness audits on the models' predictions across demographic groups and social determinants of health to better understand the health inequities in respiratory interventions in the intensive care unit. We also release a temporal benchmark dataset, verified by clinical experts, to enable benchmarking of clinical respiratory intervention tasks. 1 Introduction Critically-ill patients often find themselves in the intensive care unit (ICU) seeking specialized support for respiratory distress [ Doyle et al., 1995; Ware and Matthay, 2000 ] . Despite advances in supportive treatments, the in-hospital mortality rate remains 40% for conditions such as acute lung injury and acute respiratory distress syndrome [ Rubenfeld et al., 2005; Sweatt and Levitt, 2014 ] .


Mining Social Determinants of Health for Heart Failure Patient 30-Day Readmission via Large Language Model

Shao, Mingchen, Kang, Youjeong, Hu, Xiao, Kwak, Hyunjung Gloria, Yang, Carl, Lu, Jiaying

arXiv.org Artificial Intelligence

Heart Failure (HF) affects millions of Americans and leads to high readmission rates, posing significant healthcare challenges. While Social Determinants of Health (SDOH) such as socioeconomic status and housing stability play critical roles in health outcomes, they are often underrepresented in structured EHRs and hidden in unstructured clinical notes. This study leverages advanced large language models (LLMs) to extract SDOHs from clinical text and uses logistic regression to analyze their association with HF readmissions.


Integrating Social Determinants of Health into Knowledge Graphs: Evaluating Prediction Bias and Fairness in Healthcare

Shang, Tianqi, He, Weiqing, Chen, Tianlong, Ding, Ying, Wu, Huanmei, Zhou, Kaixiong, Shen, Li

arXiv.org Artificial Intelligence

Social determinants of health (SDoH) play a crucial role in patient health outcomes, yet their integration into biomedical knowledge graphs remains underexplored. This study addresses this gap by constructing an SDoH-enriched knowledge graph using the MIMIC-III dataset and PrimeKG. We introduce a novel fairness formulation for graph embeddings, focusing on invariance with respect to sensitive SDoH information. Via employing a heterogeneous-GCN model for drug-disease link prediction, we detect biases related to various SDoH factors. To mitigate these biases, we propose a post-processing method that strategically reweights edges connected to SDoHs, balancing their influence on graph representations. This approach represents one of the first comprehensive investigations into fairness issues within biomedical knowledge graphs incorporating SDoH. Our work not only highlights the importance of considering SDoH in medical informatics but also provides a concrete method for reducing SDoH-related biases in link prediction tasks, paving the way for more equitable healthcare recommendations. Our code is available at \url{https://github.com/hwq0726/SDoH-KG}.


Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods

Fu, Yujuan, Ramachandran, Giridhar Kaushik, Dobbins, Nicholas J, Park, Namu, Leu, Michael, Rosenberg, Abby R., Lybarger, Kevin, Xia, Fei, Uzuner, Ozlem, Yetisgen, Meliha

arXiv.org Artificial Intelligence

Social determinants of health (SDoH) play a critical role in shaping health outcomes, particularly in pediatric populations where interventions can have long-term implications. SDoH are frequently studied in the Electronic Health Record (EHR), which provides a rich repository for diverse patient data. In this work, we present a novel annotated corpus, the Pediatric Social History Annotation Corpus (PedSHAC), and evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods with Large Language Models (LLMs). PedSHAC comprises annotated social history sections from 1,260 clinical notes obtained from pediatric patients within the University of Washington (UW) hospital system. Employing an event-based annotation scheme, PedSHAC captures ten distinct health determinants to encompass living and economic stability, prior trauma, education access, substance use history, and mental health with an overall annotator agreement of 81.9 F1. Our proposed fine-tuning LLM-based extractors achieve high performance at 78.4 F1 for event arguments. In-context learning approaches with GPT-4 demonstrate promise for reliable SDoH extraction with limited annotated examples, with extraction performance at 82.3 F1 for event triggers.


Evaluating the Impact of Social Determinants on Health Prediction in the Intensive Care Unit

Yang, Ming Ying, Kwak, Gloria Hyunjung, Pollard, Tom, Celi, Leo Anthony, Ghassemi, Marzyeh

arXiv.org Artificial Intelligence

Social determinants of health (SDOH) -- the conditions in which people live, grow, and age -- play a crucial role in a person's health and well-being. There is a large, compelling body of evidence in population health studies showing that a wide range of SDOH is strongly correlated with health outcomes. Yet, a majority of the risk prediction models based on electronic health records (EHR) do not incorporate a comprehensive set of SDOH features as they are often noisy or simply unavailable. Our work links a publicly available EHR database, MIMIC-IV, to well-documented SDOH features. We investigate the impact of such features on common EHR prediction tasks across different patient populations. We find that community-level SDOH features do not improve model performance for a general patient population, but can improve data-limited model fairness for specific subpopulations. We also demonstrate that SDOH features are vital for conducting thorough audits of algorithmic biases beyond protective attributes. We hope the new integrated EHR-SDOH database will enable studies on the relationship between community health and individual outcomes and provide new benchmarks to study algorithmic biases beyond race, gender, and age.


Large Language Models to Identify Social Determinants of Health in Electronic Health Records

Guevara, Marco, Chen, Shan, Thomas, Spencer, Chaunzwa, Tafadzwa L., Franco, Idalid, Kann, Benjamin, Moningi, Shalini, Qian, Jack, Goldstein, Madeleine, Harper, Susan, Aerts, Hugo JWL, Savova, Guergana K., Mak, Raymond H., Bitterman, Danielle S.

arXiv.org Artificial Intelligence

Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHR). This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documented, yet extremely valuable, clinical data. 800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated. The study also experimented with synthetic data generation and assessed for algorithmic bias. Our best-performing models were fine-tuned Flan-T5 XL (macro-F1 0.71) for any SDoH, and Flan-T5 XXL (macro-F1 0.70). The benefit of augmenting fine-tuning with synthetic data varied across model architecture and size, with smaller Flan-T5 models (base and large) showing the greatest improvements in performance (delta F1 +0.12 to +0.23). Model performance was similar on the in-hospital system dataset but worse on the MIMIC-III dataset. Our best-performing fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models for both tasks. These fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p<0.05). At the patient-level, our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. Our method can effectively extracted SDoH information from clinic notes, performing better compare to GPT zero- and few-shot settings. These models could enhance real-world evidence on SDoH and aid in identifying patients needing social support.


SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies

Yu, Zehao, Yang, Xi, Dang, Chong, Adekkanattu, Prakash, Patra, Braja Gopal, Peng, Yifan, Pathak, Jyotishman, Wilson, Debbie L., Chang, Ching-Yuan, Lo-Ciganic, Wei-Hsuan, George, Thomas J., Hogan, William R., Guo, Yi, Bian, Jiang, Wu, Yonghui

arXiv.org Artificial Intelligence

Objective: We aim to develop an open-source natural language processing (NLP) package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH using cancer populations. Methods: We identified SDoH categories and attributes and developed an SDoH corpus using clinical notes from a general cancer cohort. We compared four transformer-based NLP models to extract SDoH, examined the generalizability of NLP models to a cohort of patients prescribed with opioids, and explored customization strategies to improve performance. We applied the best NLP model to extract 19 categories of SDoH from the breast (n=7,971), lung (n=11,804), and colorectal cancer (n=6,240) cohorts. Results and Conclusion: We developed a corpus of 629 cancer patients notes with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH. The Bidirectional Encoder Representations from Transformers (BERT) model achieved the best strict/lenient F1 scores of 0.9216 and 0.9441 for SDoH concept extraction, 0.9617 and 0.9626 for linking attributes to SDoH concepts. Fine-tuning the NLP models using new annotations from opioid use patients improved the strict/lenient F1 scores from 0.8172/0.8502 to 0.8312/0.8679. The extraction rates among 19 categories of SDoH varied greatly, where 10 SDoH could be extracted from >70% of cancer patients, but 9 SDoH had a low extraction rate (<70% of cancer patients). The SODA package with pre-trained transformer models is publicly available at https://github.com/uf-hobiinformatics-lab/SDoH_SODA.


The 2022 n2c2/UW Shared Task on Extracting Social Determinants of Health

Lybarger, Kevin, Yetisgen, Meliha, Uzuner, Özlem

arXiv.org Artificial Intelligence

Objective: The n2c2/UW SDOH Challenge explores the extraction of social determinant of health (SDOH) information from clinical notes. The objectives include the advancement of natural language processing (NLP) information extraction techniques for SDOH and clinical information more broadly. This paper presents the shared task, data, participating teams, performance results, and considerations for future work. Materials and Methods: The task used the Social History Annotated Corpus (SHAC), which consists of clinical text with detailed event-based annotations for SDOH events such as alcohol, drug, tobacco, employment, and living situation. Each SDOH event is characterized through attributes related to status, extent, and temporality. The task includes three subtasks related to information extraction (Subtask A), generalizability (Subtask B), and learning transfer (Subtask C). In addressing this task, participants utilized a range of techniques, including rules, knowledge bases, n-grams, word embeddings, and pretrained language models (LM). Results: A total of 15 teams participated, and the top teams utilized pretrained deep learning LM. The top team across all subtasks used a sequence-to-sequence approach achieving 0.901 F1 for Subtask A, 0.774 F1 Subtask B, and 0.889 F1 for Subtask C. Conclusions: Similar to many NLP tasks and domains, pretrained LM yielded the best performance, including generalizability and learning transfer. An error analysis indicates extraction performance varies by SDOH, with lower performance achieved for conditions, like substance use and homelessness, that increase health risks (risk factors) and higher performance achieved for conditions, like substance abstinence and living with family, that reduce health risks (protective factors).