Moukheiber, Lama
EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports
Moukheiber, Lama, Moukheiber, Mira, Moukheiiber, Dana, Ju, Jae-Woo, Lee, Hyung-Chul
We introduce a novel question-answering (QA) dataset using echocardiogram reports sourced from the Medical Information Mart for Intensive Care database. This dataset is specifically designed to enhance QA systems in cardiology, consisting of 771,244 QA pairs addressing a wide array of cardiac abnormalities and their severity. We compare large language models (LLMs), including open-source and biomedical-specific models for zero-shot evaluation, and closed-source models for zero-shot and three-shot evaluation. Our results show that fine-tuning LLMs improves performance across various QA metrics, validating the value of our dataset. Clinicians also qualitatively evaluate the best-performing model to assess the LLM responses for correctness. Further, we conduct fine-grained fairness audits to assess the bias-performance trade-off of LLMs across various social determinants of health. Our objective is to propel the field forward by establishing a benchmark for LLM AI agents aimed at supporting clinicians with cardiac differential diagnoses, thereby reducing the documentation burden that contributes to clinician burnout and enabling healthcare professionals to focus more on patient care.
Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of Health
Moukheiber, Mira, Moukheiber, Lama, Moukheiber, Dana, Lee, Hyung-Chul
Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of Health Mira Moukheiber 1, Lama Moukheiber 1, Dana Moukheiber 1 and Hyung-Chul Lee 2, 1 Massachusetts Institute of Technology 2 Seoul National University College of Medicine, Seoul National University Hospital, Department of Anesthesiology and Pain Medicine vital@snu.ac.kr Abstract In critical care settings, where precise and timely interventions are crucial for health outcomes, evaluating disparities in patient outcomes is important. Current approaches often fall short in comprehensively understanding and evaluating the impact of respiratory support interventions on individuals affected by social determinants of health. Attributes such as gender, race, and age are commonly assessed and essential, but provide only a partial view of the complexities faced by diverse populations. In this study, we focus on two clinically motivated tasks: prolonged mechanical ventilation and successful weaning. We also perform fairness audits on the models' predictions across demographic groups and social determinants of health to better understand the health inequities in respiratory interventions in the intensive care unit. We also release a temporal benchmark dataset, verified by clinical experts, to enable benchmarking of clinical respiratory intervention tasks. 1 Introduction Critically-ill patients often find themselves in the intensive care unit (ICU) seeking specialized support for respiratory distress [ Doyle et al., 1995; Ware and Matthay, 2000 ] . Despite advances in supportive treatments, the in-hospital mortality rate remains 40% for conditions such as acute lung injury and acute respiratory distress syndrome [ Rubenfeld et al., 2005; Sweatt and Levitt, 2014 ] .
Looking Beyond What You See: An Empirical Analysis on Subgroup Intersectional Fairness for Multi-label Chest X-ray Classification Using Social Determinants of Racial Health Inequities
Moukheiber, Dana, Mahindre, Saurabh, Moukheiber, Lama, Moukheiber, Mira, Gao, Mingchen
There has been significant progress in implementing deep learning models in disease diagnosis using chest X- rays. Despite these advancements, inherent biases in these models can lead to disparities in prediction accuracy across protected groups. In this study, we propose a framework to achieve accurate diagnostic outcomes and ensure fairness across intersectional groups in high-dimensional chest X- ray multi-label classification. Transcending traditional protected attributes, we consider complex interactions within social determinants, enabling a more granular benchmark and evaluation of fairness. We present a simple and robust method that involves retraining the last classification layer of pre-trained models using a balanced dataset across groups. Additionally, we account for fairness constraints and integrate class-balanced fine-tuning for multi-label settings. The evaluation of our method on the MIMIC-CXR dataset demonstrates that our framework achieves an optimal tradeoff between accuracy and fairness compared to baseline methods.
Early Diagnosis of Chronic Obstructive Pulmonary Disease from Chest X-Rays using Transfer Learning and Fusion Strategies
Wang, Ryan, Chen, Li-Ching, Moukheiber, Lama, Moukheiber, Mira, Moukheiber, Dana, Zaiman, Zach, Moukheiber, Sulaiman, Litchman, Tess, Seastedt, Kenneth, Trivedi, Hari, Steinberg, Rebecca, Kuo, Po-Chih, Gichoya, Judy, Celi, Leo Anthony
Chronic obstructive pulmonary disease (COPD) is one of the most common chronic illnesses in the world and the third leading cause of mortality worldwide. It is often underdiagnosed or not diagnosed until later in the disease course. Spirometry tests are the gold standard for diagnosing COPD but can be difficult to obtain, especially in resource-poor countries. Chest X-rays (CXRs), however, are readily available and may serve as a screening tool to identify patients with COPD who should undergo further testing. Currently, no research applies deep learning (DL) algorithms that use large multi-site and multi-modal data to detect COPD patients and evaluate fairness across demographic groups. We use three CXR datasets in our study, CheXpert to pre-train models, MIMIC-CXR to develop, and Emory-CXR to validate our models. The CXRs from patients in the early stage of COPD and not on mechanical ventilation are selected for model training and validation. We visualize the Grad-CAM heatmaps of the true positive cases on the base model for both MIMIC-CXR and Emory-CXR test datasets. We further propose two fusion schemes, (1) model-level fusion, including bagging and stacking methods using MIMIC-CXR, and (2) data-level fusion, including multi-site data using MIMIC-CXR and Emory-CXR, and multi-modal using MIMIC-CXRs and MIMIC-IV EHR, to improve the overall model performance. Fairness analysis is performed to evaluate if the fusion schemes have a discrepancy in the performance among different demographic groups. The results demonstrate that DL models can detect COPD using CXRs, which can facilitate early screening, especially in low-resource regions where CXRs are more accessible than spirometry. The multi-site data fusion scheme could improve the model generalizability on the Emory-CXR test data. Further studies on using CXR or other modalities to predict COPD ought to be in future work.