P-CAFE: Personalized Cost-Aware Incremental Feature Selection For Electronic Health Records

Kashani, Naama, Cohen, Mira, Shaham, Uri

arXiv.org Artificial Intelligence 

Electronic Health Records (EHRs) serve as comprehensive digital repositories of patient health information, encompassing both structured and unstructured data (Bates et al., 2014). A thorough understanding of EHR data can significantly enhance various aspects of patient care, including disease prediction, healthcare quality improvement, and resource allocation (Shickel et al., 2018; Kim et al., 2019). However, EHR data presents unique challenges: it is often high-dimensional, multimodal, sparse, and temporal (Wu et al., 2010; Menachemi and Collum, 2011; Xiao et al., 2018). Records typically include a diverse array of modalities, such as demographics, diagnoses, procedures, medications, prescriptions, radiological images, clinical notes, and laboratory results. The data is inherently sparse, as medical events occur irregularly, and sequential, as patient histories accumulate over time. To address these complexities, many approaches employ feature selection (FS) -- the process of identifying the most informative variables from high-dimensional input to improve model performance, interpretability, and robustness (Remeseiro and Bolon-Canedo, 2019; Chandrashekar and Sahin, 2014). Y et, to the best of our knowledge, existing FS methods applied to EHRs either ignore multimodality or fail to capture temporal dynamics.