clmbr-t-base
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Wornow, Michael, Bedi, Suhana, Hernandez, Miguel Angel Fuentes, Steinberg, Ethan, Fries, Jason Alan, Ré, Christopher, Koyejo, Sanmi, Shah, Nigam H.
Foundation Models (FMs) trained on Electronic Health Records (EHRs) have achieved state-of-the-art results on numerous clinical prediction tasks. However, most existing EHR FMs have context windows of <1k tokens. This prevents them from modeling full patient EHRs which can exceed 10k's of events. Recent advancements in subquadratic long-context architectures (e.g., Mamba) offer a promising solution. However, their application to EHR data has not been well-studied. We address this gap by presenting the first systematic evaluation of the effect of context length on modeling EHR data. We find that longer context models improve predictive performance -- our Mamba-based model surpasses the prior state-of-the-art on 9/14 tasks on the EHRSHOT prediction benchmark. For clinical applications, however, model performance alone is insufficient -- robustness to the unique properties of EHR is crucial. Thus, we also evaluate models across three previously underexplored properties of EHR data: (1) the prevalence of "copy-forwarded" diagnoses which creates artificial repetition of tokens within EHR sequences; (2) the irregular time intervals between EHR events which can lead to a wide range of timespans within a context window; and (3) the natural increase in disease complexity over time which makes later tokens in the EHR harder to predict than earlier ones. Stratifying our EHRSHOT results, we find that higher levels of each property correlate negatively with model performance, but that longer context models are more robust to more extreme levels of these properties. Our work highlights the potential for using long-context architectures to model EHR data, and offers a case study for identifying new challenges in modeling sequential data motivated by domains outside of natural language. We release our models and code at: https://github.com/som-shahlab/long_context_clues
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Health & Medicine > Therapeutic Area > Oncology (0.96)
- Health & Medicine > Therapeutic Area > Internal Medicine (0.72)
- Health & Medicine > Health Care Technology > Medical Record (0.69)
EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
Wornow, Michael, Thapa, Rahul, Steinberg, Ethan, Fries, Jason A., Shah, Nigam H.
While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, which contains deidentified structured data from the electronic health records (EHRs) of 6,739 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of CLMBR-T-base, a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaptation. Our model and dataset are available via a research data use agreement from our website: https://ehrshot.stanford.edu. Code to reproduce our results are available at our Github repo: https://github.com/som-shahlab/ehrshot-benchmark
- North America > United States > California > Santa Clara County > Palo Alto (0.25)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > New Finding (0.88)
- Research Report > Experimental Study (0.86)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Health Care Technology > Medical Record (1.00)
- Health & Medicine > Health Care Providers & Services (1.00)
- Health & Medicine > Diagnostic Medicine (0.88)