Blankemeier, Louis
Identifying Spurious Correlations using Counterfactual Alignment
Cohen, Joseph Paul, Blankemeier, Louis, Chaudhari, Akshay
Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual alignment method to detect and explore spurious correlations of black box classifiers. Counterfactual images generated with respect to one classifier can be input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists as well as compute aggregate statistics over a dataset. Our work demonstrates the ability to detect spurious correlations in face attribute classifiers. This is validated by observing intuitive trends in a face attribute classifier as well as fabricating spurious correlations and detecting their presence, both visually and quantitatively. Further, utilizing the CF alignment method, we demonstrate that we can rectify spurious correlations identified in classifiers.
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts
Van Veen, Dave, Van Uden, Cara, Blankemeier, Louis, Delbrouck, Jean-Benoit, Aali, Asad, Bluethgen, Christian, Pareek, Anuj, Polacin, Malgorzata, Reis, Eduardo Pontes, Seehofnerova, Anna, Rohatgi, Nidhi, Hosamani, Poonam, Collins, William, Ahuja, Neera, Langlotz, Curtis P., Hom, Jason, Gatidis, Sergios, Pauly, John, Chaudhari, Akshay S.
Sifting through vast textual data and summarizing key information from electronic health records (EHR) imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown immense promise in natural language processing (NLP) tasks, their efficacy on a diverse range of clinical summarization tasks has not yet been rigorously demonstrated. In this work, we apply domain adaptation methods to eight LLMs, spanning six datasets and four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Our thorough quantitative assessment reveals trade-offs between models and adaptation methods in addition to instances where recent advances in LLMs may not improve results. Further, in a clinical reader study with ten physicians, we show that summaries from our best-adapted LLMs are preferable to human summaries in terms of completeness and correctness. Our ensuing qualitative analysis highlights challenges faced by both LLMs and human experts. Lastly, we correlate traditional quantitative NLP metrics with reader study scores to enhance our understanding of how these metrics align with physician preferences. Our research marks the first evidence of LLMs outperforming human experts in clinical text summarization across multiple tasks. This implies that integrating LLMs into clinical workflows could alleviate documentation burden, empowering clinicians to focus more on personalized patient care and the inherently human aspects of medicine.
Optimizing Audio Augmentations for Contrastive Learning of Health-Related Acoustic Signals
Blankemeier, Louis, Baur, Sebastien, Weng, Wei-Hung, Garrison, Jake, Matias, Yossi, Prabhakara, Shruthi, Ardila, Diego, Nabulsi, Zaid
Health-related acoustic signals, such as cough and breathing sounds, are relevant for medical diagnosis and continuous health monitoring. Most existing machine learning approaches for health acoustics are trained and evaluated on specific tasks, limiting their generalizability across various healthcare applications. In this paper, we leverage a self-supervised learning framework, SimCLR with a Slowfast NFNet backbone, for contrastive learning of health acoustics. A crucial aspect of optimizing Slowfast NFNet for this application lies in identifying effective audio augmentations. We conduct an in-depth analysis of various audio augmentation strategies and demonstrate that an appropriate augmentation strategy enhances the performance of the Slowfast NFNet audio encoder across a diverse set of health acoustic tasks. Our findings reveal that when augmentations are combined, they can produce synergistic effects that exceed the benefits seen when each is applied individually.
Comp2Comp: Open-Source Body Composition Assessment on Computed Tomography
Blankemeier, Louis, Desai, Arjun, Chaves, Juan Manuel Zambrano, Wentland, Andrew, Yao, Sally, Reis, Eduardo, Jensen, Malte, Bahl, Bhanushree, Arora, Khushboo, Patel, Bhavik N., Lenchik, Leon, Willis, Marc, Boutin, Robert D., Chaudhari, Akshay S.
Computed tomography (CT) is routinely used in clinical practice to evaluate a wide variety of medical conditions. While CT scans provide diagnoses, they also offer the ability to extract quantitative body composition metrics to analyze tissue volume and quality. Extracting quantitative body composition measures manually from CT scans is a cumbersome and time-consuming task. Proprietary software has been developed recently to automate this process, but the closed-source nature impedes widespread use. There is a growing need for fully automated body composition software that is more accessible and easier to use, especially for clinicians and researchers who are not experts in medical image processing. To this end, we have built Comp2Comp, an open-source Python package for rapid and automated body composition analysis of CT scans. This package offers models, post-processing heuristics, body composition metrics, automated batching, and polychromatic visualizations. Comp2Comp currently computes body composition measures for bone, skeletal muscle, visceral adipose tissue, and subcutaneous adipose tissue on CT scans of the abdomen. We have created two pipelines for this purpose. The first pipeline computes vertebral measures, as well as muscle and adipose tissue measures, at the T12 - L5 vertebral levels from abdominal CT scans. The second pipeline computes muscle and adipose tissue measures on user-specified 2D axial slices. In this guide, we discuss the architecture of the Comp2Comp pipelines, provide usage instructions, and report internal and external validation results to measure the quality of segmentations and body composition measures. Comp2Comp can be found at https://github.com/StanfordMIMI/Comp2Comp.