Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models
Kwon, Jeong Eul, Yoon, Joo Heung, Lee, Hyo Kyung
–arXiv.org Artificial Intelligence
Irregular sampling and high missingness are intrinsic challenges in modeling time series derived from electronic health records (EHRs), where clinical variables are measured at uneven intervals depending on workflow and intervention timing. To address this, we propose VITAL -- a variable-aware, large language model (LLM)-based framework tailored for learning from irregularly sampled physiological time series. VITAL differentiates between two distinct types of clinical variables: vital signs, which are frequently recorded and exhibit temporal patterns, and laboratory tests, which are measured sporadically and lack temporal structure. It reprograms vital signs into the language space, enabling the LLM to capture temporal context and reason over missing values through explicit encoding. In contrast, laboratory variables are embedded either using representative summary values or a learnable [Not measured] token, depending on their availability. Extensive evaluations on the benchmark datasets from the PhysioNet demonstrate that VITAL outperforms state-of-the-art methods designed for irregular time series. Furthermore, it maintains robust performance under high levels of missigness, which is prevalent in real-world clinical scenarios where key variables are often unavailable. Introduction Electronic Health Records (EHRs) digitally capture a wealth of patient data generated during routine clinical care. In particular, the Intensive Care Unit (ICU) is a data-rich environment due to the need for continuous, high-resolution patient monitoring. This has led to a surge of research in medical artificial intelligence (AI), with many studies leveraging publicly available EHR datasets in combination with machine learning techniques for tasks such as early warning, outcome prediction and patient stratification [1, 2, 3, 4, 5, 6, 7, 8, 9] A common approach in these studies is to model patient records as multivariate time series, capturing the temporal evolution of physiological and clinical variables. However, in practice, EHR time series are often irregularly sampled due to variations in clinical workflows, measurement protocols, and intervention timing.
arXiv.org Artificial Intelligence
Sep-29-2025
- Country:
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine
- Diagnostic Medicine (1.00)
- Health Care Technology > Medical Record (1.00)
- Therapeutic Area (1.00)
- Health & Medicine
- Technology: