Trivedi, Hari
Novel AI-Based Quantification of Breast Arterial Calcification to Predict Cardiovascular Risk
Dapamede, Theodorus, Urooj, Aisha, Joshi, Vedant, Gershon, Gabrielle, Li, Frank, Chavoshi, Mohammadreza, Brown-Mulry, Beatrice, Isaac, Rohan Satya, Mansuri, Aawez, Robichaux, Chad, Ayoub, Chadi, Arsanjani, Reza, Sperling, Laurence, Gichoya, Judy, van Assen, Marly, ONeill, Charles W., Banerjee, Imon, Trivedi, Hari
IMPORTANCE Women are underdiagnosed and undertreated for cardiovascular disease. Automatic quantification of breast arterial calcification on screening mammography can identify women at risk for cardiovascular disease and enable earlier treatment and management of disease. OBJECTIVE To determine whether artificial-intelligence based automatic quantification of BAC from screening mammograms predicts cardiovascular disease and mortality in a large, racially diverse, multi-institutional population, both independently and beyond traditional risk factors and ASCVD scores. DESIGN, SETTING, AND PARTICIPANTS Retrospective cohort study of 116,135 women from two healthcare systems (Emory Healthcare and Mayo Clinic Enterprise) who had screening mammograms and either experienced a major adverse cardiovascular event, death, or had at least 5 years of clinical follow-up. BAC was quantified using a novel transformer-based neural network architecture for semantic segmentation. BAC severity was categorized into four groups (no BAC, mild, moderate, and severe), with outcomes assessed using Kaplan-Meier analysis and Cox proportional-hazards models. MAIN OUTCOMES AND MEASURES Major Adverse Cardiovascular Events (MACE), including acute myocardial infarction, stroke, heart failure, and all-cause mortality, adjusted for traditional risk factors and Atherosclerotic CVD (ASCVD) risk scores. RESULTS BAC severity was independently associated with MACE after adjusting for cardiovascular risk factors, with increasing hazard ratios from mild (HR 1.18-1.22),
Emory Knee Radiograph (MRKR) Dataset
Price, Brandon, Adleberg, Jason, Thomas, Kaesha, Zaiman, Zach, Mansuri, Aawez, Brown-Mulry, Beatrice, Okecheukwu, Chima, Gichoya, Judy, Trivedi, Hari
The Emory Knee Radiograph (MRKR) dataset is a large, demographically diverse collection of 503,261 knee radiographs from 83,011 patients, 40% of which are African American. This dataset provides imaging data in DICOM format along with detailed clinical information, including patient-reported pain scores, diagnostic codes, and procedural codes, which are not commonly available in similar datasets. The MRKR dataset also features imaging metadata such as image laterality, view type, and presence of hardware, enhancing its value for research and model development. MRKR addresses significant gaps in existing datasets by offering a more representative sample for studying osteoarthritis and related outcomes, particularly among minority populations, thereby providing a valuable resource for clinicians and researchers.
Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC) -- an end-to-end model for characterizing severity and diagnosis
Santos, Thiago, Kamath, Harish, McAdams, Christopher R., Newell, Mary S., Mosunjac, Marina, Oprea-Ilies, Gabriela, Smith, Geoffrey, Lehman, Constance, Gichoya, Judy, Banerjee, Imon, Trivedi, Hari
Automated classification of cancer pathology reports can extract information from unstructured reports and categorize each report into structured diagnosis and severity categories. Thus, such system can reduce the burden for populating tumor registries, help registration for clinical trial as well as developing large dataset for deep learning model development using true pathologic ground truth. However, the content of breast pathology reports can be difficult for categorize due to the high linguistic variability in content and wide variety of potential diagnoses >50. Existing NLP models are primarily focused on developing classifier for primary breast cancer types (e.g. IDC, DCIS, ILC) and tumor characteristics, and ignore the rare diagnosis of cancer subtypes. We then developed a hierarchical hybrid transformer-based pipeline (59 labels) - Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC), which utilizes the potential of the transformer context-preserving NLP technique and compared our model to several state of the art ML and DL models. We trained the model on the EUH data and evaluated our model's performance on two external datasets - MGH and Mayo Clinic. We publicly release the code and a live application under Huggingface spaces repository
Multivariate Analysis on Performance Gaps of Artificial Intelligence Models in Screening Mammography
Zhang, Linglin, Brown-Mulry, Beatrice, Nalla, Vineela, Hwang, InChan, Gichoya, Judy Wawira, Gastounioti, Aimilia, Banerjee, Imon, Seyyed-Kalantari, Laleh, Woo, MinJae, Trivedi, Hari
Although deep learning models for abnormality classification can perform well in screening mammography, the demographic, imaging, and clinical characteristics associated with increased risk of model failure remain unclear. This retrospective study uses the Emory BrEast Imaging Dataset(EMBED) containing mammograms from 115931 patients imaged at Emory Healthcare between 2013-2020, with BI-RADS assessment, region of interest coordinates for abnormalities, imaging features, pathologic outcomes, and patient demographics. Multiple deep learning models were trained to distinguish between abnormal tissue patches and randomly selected normal tissue patches from screening mammograms. We assessed model performance by subgroups defined by age, race, pathologic outcome, tissue density, and imaging characteristics and investigated their associations with false negatives (FN) and false positives (FP). We also performed multivariate logistic regression to control for confounding between subgroups. The top-performing model, ResNet152V2, achieved accuracy of 92.6%(95%CI=92.0-93.2%), and AUC 0.975(95%CI=0.972-0.978). Before controlling for confounding, nearly all subgroups showed statistically significant differences in model performance. However, after controlling for confounding, we found lower FN risk associates with Other race(RR=0.828;p=.050), biopsy-proven benign lesions(RR=0.927;p=.011), and mass(RR=0.921;p=.010) or asymmetry(RR=0.854;p=.040); higher FN risk associates with architectural distortion (RR=1.037;p<.001). Higher FP risk associates to BI-RADS density C(RR=1.891;p<.001) and D(RR=2.486;p<.001). Our results demonstrate subgroup analysis is important in mammogram classifier performance evaluation, and controlling for confounding between subgroups elucidates the true associations between variables and model failure. These results can help guide developing future breast cancer detection models.
Early Diagnosis of Chronic Obstructive Pulmonary Disease from Chest X-Rays using Transfer Learning and Fusion Strategies
Wang, Ryan, Chen, Li-Ching, Moukheiber, Lama, Moukheiber, Mira, Moukheiber, Dana, Zaiman, Zach, Moukheiber, Sulaiman, Litchman, Tess, Seastedt, Kenneth, Trivedi, Hari, Steinberg, Rebecca, Kuo, Po-Chih, Gichoya, Judy, Celi, Leo Anthony
Chronic obstructive pulmonary disease (COPD) is one of the most common chronic illnesses in the world and the third leading cause of mortality worldwide. It is often underdiagnosed or not diagnosed until later in the disease course. Spirometry tests are the gold standard for diagnosing COPD but can be difficult to obtain, especially in resource-poor countries. Chest X-rays (CXRs), however, are readily available and may serve as a screening tool to identify patients with COPD who should undergo further testing. Currently, no research applies deep learning (DL) algorithms that use large multi-site and multi-modal data to detect COPD patients and evaluate fairness across demographic groups. We use three CXR datasets in our study, CheXpert to pre-train models, MIMIC-CXR to develop, and Emory-CXR to validate our models. The CXRs from patients in the early stage of COPD and not on mechanical ventilation are selected for model training and validation. We visualize the Grad-CAM heatmaps of the true positive cases on the base model for both MIMIC-CXR and Emory-CXR test datasets. We further propose two fusion schemes, (1) model-level fusion, including bagging and stacking methods using MIMIC-CXR, and (2) data-level fusion, including multi-site data using MIMIC-CXR and Emory-CXR, and multi-modal using MIMIC-CXRs and MIMIC-IV EHR, to improve the overall model performance. Fairness analysis is performed to evaluate if the fusion schemes have a discrepancy in the performance among different demographic groups. The results demonstrate that DL models can detect COPD using CXRs, which can facilitate early screening, especially in low-resource regions where CXRs are more accessible than spirometry. The multi-site data fusion scheme could improve the model generalizability on the Emory-CXR test data. Further studies on using CXR or other modalities to predict COPD ought to be in future work.