Salvaging Forbidden Treasure in Medical Data: Utilizing Surrogate Outcomes and Single Records for Rare Event Modeling
Yin, Xiaohui, Sacco, Shane, Aseltine, Robert H., Wang, Fei, Chen, Kun
The vast repositories of Electronic Health Records (EHR) and medical claims hold untapped potential for studying rare but critical events, such as suicide attempt. Conventional setups often model suicide attempt as a univariate outcome and also exclude any ``single-record'' patients with a single documented encounter due to a lack of historical information. However, patients who were diagnosed with suicide attempts at the only encounter could, to some surprise, represent a substantial proportion of all attempt cases in the data, as high as 70--80%. We innovate a hybrid and integrative learning framework to leverage concurrent outcomes as surrogates and harness the forbidden yet precious information from single-record data. Our approach employs a supervised learning component to learn the latent variables that connect primary (e.g., suicide) and surrogate outcomes (e.g., mental disorders) to historical information. It simultaneously employs an unsupervised learning component to utilize the single-record data, through the shared latent variables. As such, our approach offers a general strategy for information integration that is crucial to modeling rare conditions and events. With hospital inpatient data from Connecticut, we demonstrate that single-record data and concurrent diagnoses indeed carry valuable information, and utilizing them can substantially improve suicide risk modeling.
Jan-25-2025
- Country:
- Asia > Thailand (0.04)
- North America > United States
- Connecticut (0.24)
- New York (0.04)
- Kansas (0.04)
- California > Alameda County
- Berkeley (0.04)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Technology:
- Information Technology
- Data Science (1.00)
- Artificial Intelligence > Machine Learning
- Statistical Learning (1.00)
- Neural Networks (1.00)
- Performance Analysis > Accuracy (0.93)
- Information Technology