Simpson, Chloé
Federated Survival Analysis with Discrete-Time Cox Models
Andreux, Mathieu, Manoel, Andre, Menuet, Romuald, Saillard, Charlie, Simpson, Chloé
Building machine learning models from decentralized datasets located in different centers with federated learning (FL) is a promising approach to circumvent local data scarcity while preserving privacy. However, the prominent Cox proportional hazards (PH) model, used for survival analysis, does not fit the FL framework, as its loss function is non-separable with respect to the samples. The na\"ive method to bypass this non-separability consists in calculating the losses per center, and minimizing their sum as an approximation of the true loss. We show that the resulting model may suffer from important performance loss in some adverse settings. Instead, we leverage the discrete-time extension of the Cox PH model to formulate survival analysis as a classification problem with a separable loss function. Using this approach, we train survival models using standard FL techniques on synthetic data, as well as real-world datasets from The Cancer Genome Atlas (TCGA), showing similar performance to a Cox PH model trained on aggregated data. Compared to previous works, the proposed method is more communication-efficient, more generic, and more amenable to using privacy-preserving techniques.
The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data
Ding, Daisy Yi, Simpson, Chloé, Pfohl, Stephen, Kale, Dave C., Jung, Kenneth, Shah, Nigam H.
Electronic phenotyping, which is the task of ascertaining whether an individual has a medical condition of interest by analyzing their medical records, is a foundational task in clinical informatics. Increasingly, electronic phenotyping is performed via supervised learning. We investigate the effectiveness of multitask learning for phenotyping using electronic health records (EHR) data. Multitask learning aims to improve model performance on a target task by jointly learning additional auxiliary tasks, and has been used to good effect in disparate areas of machine learning. However, its utility when applied to EHR data has not been established, and prior work suggests that its benefits are inconsistent. Here we present experiments that elucidate when multitask learning with neural networks can improve performance for electronic phenotyping using EHR data relative to well-tuned single task neural networks. We find that multitask networks consistently outperform single task networks for rare phenotypes but underperform for more common phenotypes. The effect size increases as more auxiliary tasks are added.