TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records
Cui, Hejie, Unell, Alyssa, Chen, Bowen, Fries, Jason Alan, Alsentzer, Emily, Koyejo, Sanmi, Shah, Nigam
–arXiv.org Artificial Intelligence
Tasks such as chronic disease Large language models (LLMs) have emerged management, multi-visit care planning, and patient history as promising tools for assisting in medical tasks, synthesis require clinicians to understand complex relationships yet processing Electronic Health Records (EHRs) between different record entries and how past events presents unique challenges due to their longitudinal influence current and future clinical decisions (Wornow nature. While LLMs' capabilities to perform et al., 2024). The cognitive demands of processing such medical tasks continue to improve, their ability lengthy documentation are significant. While biomedical to reason over temporal dependencies across LLMs have shown promising results on well-structured multiple patient visits and time frames remains tasks like answering USMLE questions and medical knowledge unexplored. We introduce TIMER (Temporal retrieval (Singhal et al., 2023; Lu et al., 2024; Lucas Instruction Modeling and Evaluation for Longitudinal et al., 2024), recent evaluations reveal their significant limitations Clinical Records), a framework that incorporate in processing longitudinal patient information and in instruction-response pairs grounding to making clinical decisions over time (Hager et al., 2024; Bedi different parts of a patient's record as a critical et al., 2024). The gap between isolated question-answering dimension in both instruction evaluation and tuning performance and temporal reasoning ability impacts the for longitudinal clinical records. We develop practical utility of LLMs in healthcare. While there is some TIMER-Bench, the first time-aware benchmark prior work that has explored temporal understanding abilities that evaluates temporal reasoning capabilities over of general LLMs (Wang & Zhao, 2024; Fatemi et al., longitudinal EHRs, as well as TIMER-Instruct, 2024; Herel et al., 2024), how these capabilities scale to an instruction-tuning methodology for LLMs to longer contexts remains understudied, particularly in healthcare learn reasoning over time. We demonstrate that where longitudinal reasoning is important.
arXiv.org Artificial Intelligence
Mar-6-2025