Piening, Brian
Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale
Wong, Cliff, Preston, Sam, Liu, Qianchu, Gero, Zelalem, Bagga, Jass, Zhang, Sheng, Jain, Shrey, Zhao, Theodore, Gu, Yu, Xu, Yanbo, Kiblawi, Sid, Weerasinghe, Roshanthi, Leidner, Rom, Young, Kristina, Piening, Brian, Bifulco, Carlo, Naumann, Tristan, Wei, Mu, Poon, Hoifung
The vast majority of real-world patient information resides in unstructured clinical text, and the process of medical abstraction seeks to extract and normalize structured information from this unstructured input. However, traditional medical abstraction methods can require significant manual efforts that can include crafting rules or annotating training labels, limiting scalability. In this paper, we propose UniMedAbstractor (UMA), a zero-shot medical abstraction framework leveraging Large Language Models (LLMs) through a modular and customizable prompt template. We refer to our approach as universal abstraction as it can quickly scale to new attributes through its universal prompt template without curating attribute-specific training labels or rules. We evaluate UMA for oncology applications, focusing on fifteen key attributes representing the cancer patient journey, from short-context attributes (e.g., performance status, treatment) to complex long-context attributes requiring longitudinal reasoning (e.g., tumor site, histology, TNM staging). Experiments on real-world data show UMA's strong performance and generalizability. Compared to supervised and heuristic baselines, UMA with GPT-4o achieves on average an absolute 2-point F1/accuracy improvement for both short-context and long-context attribute abstraction. For pathologic T staging, UMA even outperforms the supervised model by 20 points in accuracy.
Applying Large Language Models for Causal Structure Learning in Non Small Cell Lung Cancer
Naik, Narmada, Khandelwal, Ayush, Joshi, Mohit, Atre, Madhusudan, Wright, Hollis, Kannan, Kavya, Hill, Scott, Mamidipudi, Giridhar, Srinivasa, Ganapati, Bifulco, Carlo, Piening, Brian, Matlock, Kevin
Causal discovery is becoming a key part in medical AI research. These methods can enhance healthcare by identifying causal links between biomarkers, demographics, treatments and outcomes. They can aid medical professionals in choosing more impactful treatments and strategies. In parallel, Large Language Models (LLMs) have shown great potential in identifying patterns and generating insights from text data. In this paper we investigate applying LLMs to the problem of determining the directionality of edges in causal discovery. Specifically, we test our approach on a deidentified set of Non Small Cell Lung Cancer(NSCLC) patients that have both electronic health record and genomic panel data. Graphs are validated using Bayesian Dirichlet estimators using tabular data. Our result shows that LLMs can accurately predict the directionality of edges in causal graphs, outperforming existing state-of-the-art methods. These findings suggests that LLMs can play a significant role in advancing causal discovery and help us better understand complex systems.
TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence Generation with Biomedical Language Models
González, Javier, Wong, Cliff, Gero, Zelalem, Bagga, Jass, Ueno, Risa, Chien, Isabel, Oravkin, Eduard, Kiciman, Emre, Nori, Aditya, Weerasinghe, Roshanthi, Leidner, Rom S., Piening, Brian, Naumann, Tristan, Bifulco, Carlo, Poon, Hoifung
The rapid digitization of real-world data offers an unprecedented opportunity for optimizing healthcare delivery and accelerating biomedical discovery. In practice, however, such data is most abundantly available in unstructured forms, such as clinical notes in electronic medical records (EMRs), and it is generally plagued by confounders. In this paper, we present TRIALSCOPE, a unifying framework for distilling real-world evidence from population-level observational data. TRIALSCOPE leverages biomedical language models to structure clinical text at scale, employs advanced probabilistic modeling for denoising and imputation, and incorporates state-of-the-art causal inference techniques to combat common confounders. Using clinical trial specification as generic representation, TRIALSCOPE provides a turn-key solution to generate and reason with clinical hypotheses using observational data. In extensive experiments and analyses on a large-scale real-world dataset with over one million cancer patients from a large US healthcare network, we show that TRIALSCOPE can produce high-quality structuring of real-world data and generates comparable results to marquee cancer trials. In addition to facilitating in-silicon clinical trial design and optimization, TRIALSCOPE may be used to empower synthetic controls, pragmatic trials, post-market surveillance, as well as support fine-grained patient-like-me reasoning in precision diagnosis and treatment.
Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology
Wong, Cliff, Zhang, Sheng, Gu, Yu, Moung, Christine, Abel, Jacob, Usuyama, Naoto, Weerasinghe, Roshanthi, Piening, Brian, Naumann, Tristan, Bifulco, Carlo, Poon, Hoifung
Clinical trial matching is a key process in health delivery and discovery. In practice, it is plagued by overwhelming unstructured data and unscalable manual processing. In this paper, we conduct a systematic study on scaling clinical trial matching using large language models (LLMs), with oncology as the focus area. Our study is grounded in a clinical trial matching system currently in test deployment at a large U.S. health network. Initial findings are promising: out of box, cutting-edge LLMs, such as GPT-4, can already structure elaborate eligibility criteria of clinical trials and extract complex matching logic (e.g., nested AND/OR/NOT). While still far from perfect, LLMs substantially outperform prior strong baselines and may serve as a preliminary solution to help triage patient-trial candidates with humans in the loop. Our study also reveals a few significant growth areas for applying LLMs to end-to-end clinical trial matching, such as context limitation and accuracy, especially in structuring patient information from longitudinal medical records.