Ghassemi, Marzyeh
Rethinking clinical prediction: Why machine learning must consider year of care and feature aggregation
Nestor, Bret, McDermott, Matthew B. A., Chauhan, Geeticka, Naumann, Tristan, Hughes, Michael C., Goldenberg, Anna, Ghassemi, Marzyeh
Machine learning for healthcare often trains models on de-identified datasets with randomly-shifted calendar dates, ignoring the fact that data were generated under hospital operation practices that change over time. These changing practices induce definitive changes in observed data which confound evaluations which do not account for dates and limit the generalisability of date-agnostic models. In this work, we establish the magnitude of this problem on MIMIC, a public hospital dataset, and showcase a simple solution. We augment MIMIC with the year in which care was provided and show that a model trained using standard feature representations will significantly degrade in quality over time. We find a deterioration of 0.3 AUC when evaluating mortality prediction on data from 10 years later. We find a similar deterioration of 0.15 AUC for length-of-stay. In contrast, we demonstrate that clinically-oriented aggregates of raw features significantly mitigate future deterioration. Our suggested aggregated representations, when retrained yearly, have prediction quality comparable to year-agnostic models.
The Effect of Heterogeneous Data for Alzheimer's Disease Detection from Speech
Balagopalan, Aparna, Novikova, Jekaterina, Rudzicz, Frank, Ghassemi, Marzyeh
Speech datasets for identifying Alzheimer's disease (AD) are generally restricted to participants performing a single task, e.g. describing an image shown to them. As a result, models trained on linguistic features derived from such datasets may not be generalizable across tasks. Building on prior work demonstrating that same-task data of healthy participants helps improve AD detection on a single-task dataset of pathological speech, we augment an AD-specific dataset consisting of subjects describing a picture with multi-task healthy data. We demonstrate that normative data from multiple speech-based tasks helps improve AD detection by up to 9%. Visualization of decision boundaries reveals that models trained on a combination of structured picture descriptions and unstructured conversational speech have the least out-of-task error and show the most potential to generalize to multiple tasks. We analyze the impact of age of the added samples and if they affect fairness in classification. We also provide explanations for a possible inductive bias effect across tasks using model-agnostic feature anchors. This work highlights the need for heterogeneous datasets for encoding changes in multiple facets of cognition and for developing a task-independent AD detection model.
Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
Antropova, Natalia, Beam, Andrew L., Beaulieu-Jones, Brett K., Chen, Irene, Chivers, Corey, Dalca, Adrian, Finlayson, Sam, Fiterau, Madalina, Fries, Jason Alan, Ghassemi, Marzyeh, Hughes, Mike, Jedynak, Bruno, Kandola, Jasvinder S., McDermott, Matthew, Naumann, Tristan, Schulam, Peter, Shamout, Farah, Yahi, Alexandre
Modeling Mistrust in End-of-Life Care
Boag, Willie, Suresh, Harini, Celi, Leo Anthony, Szolovits, Peter, Ghassemi, Marzyeh
In this work, we characterize the doctor-patient relationship using a machine learning-derived trust score. We show that this score has statistically significant racial associations, and that by modeling trust directly we find stronger disparities in care than by stratifying on race. We further demonstrate that mistrust is indicative of worse outcomes, but is only weakly associated with physiologically-created severity scores. Finally, we describe sentiment analysis experiments indicating patients with higher levels of mistrust have worse experiences and interactions with their caregivers. This work is a step towards measuring fairer machine learning in the healthcare domain.
Opportunities in Machine Learning for Healthcare
Ghassemi, Marzyeh, Naumann, Tristan, Schulam, Peter, Beam, Andrew L., Ranganath, Rajesh
Healthcare is a natural arena for the application of machine learning, especially as modern electronic health records (EHRs) provide increasingly large amounts of data to answer clinically meaningful questions. However, clinical data and practice present unique challenges that complicate the use of common methodologies. This article serves as a primer on addressing these challenges and highlights opportunities for members of the machine learning and data science communities to contribute to this growing domain.
Semi-Supervised Biomedical Translation With Cycle Wasserstein Regression GANs
McDermott, Matthew B. A. (MIT) | Yan, Tom (MIT) | Naumann, Tristan (MIT) | Hunt, Nathan (MIT) | Suresh, Harini (MIT) | Szolovits, Peter (MIT) | Ghassemi, Marzyeh (MIT)
The biomedical field offers many learning tasks that share unique challenges: large amounts of unpaired data, and a high cost to generate labels. In this work, we develop a method to address these issues with semi-supervised learning in regression tasks (e.g., translation from source to target). Our model uses adversarial signals to learn from unpaired datapoints, and imposes a cycle-loss reconstruction error penalty to regularize mappings in either direction against one another. We first evaluate our method on synthetic experiments, demonstrating two primary advantages of the system: 1) distribution matching via the adversarial loss and 2) regularization towards invertible mappings via the cycle loss. We then show a regularization effect and improved performance when paired data is supplemented by additional unpaired data on two real biomedical regression tasks: estimating the physiological effect of medical treatments, and extrapolating gene expression (transcriptomics) signals. Our proposed technique is a promising initial step towards more robust use of adversarial signals in semi-supervised regression, and could be useful for other tasks (e.g., causal inference or modality translation) in the biomedical field.
Short-term Mortality Prediction for Elderly Patients Using Medicare Claims Data
Makar, Maggie, Ghassemi, Marzyeh, Cutler, David, Obermeyer, Ziad
Risk prediction is central to both clinical medicine and public health. While many machine learning models have been developed to predict mortality, they are rarely applied in the clinical literature, where classification tasks typically rely on logistic regression. One reason for this is that existing machine learning models often seek to optimize predictions by incorporating features that are not present in the databases readily available to providers and policy makers, limiting generalizability and implementation. Here we tested a number of machine learning classifiers for prediction of six-month mortality in a population of elderly Medicare beneficiaries, using an administrative claims database of the kind available to the majority of health care payers and providers. We show that machine learning classifiers substantially outperform current widely-used methods of risk prediction but only when used with an improved feature set incorporating insights from clinical medicine, developed for this study. Our work has applications to supporting patient and provider decision making at the end of life, as well as population health-oriented efforts to identify patients at high risk of poor outcomes.
A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data
Ghassemi, Marzyeh (Massachusetts Institute of Technology) | Pimentel, Marco A.F. (University of Oxford) | Naumann, Tristan (Massachusetts Institute of Technology) | Brennan, Thomas (Massachusetts Institute of Technology) | Clifton, David A. (University of Oxford) | Szolovits, Peter (Massachusetts Institute of Technology) | Feng, Mengling (Massachusetts Institute of Technology)
The ability to determine patient acuity (or severity of illness) has immediate practical use for clinicians. We evaluate the use of multivariate timeseries modeling with the multi-task Gaussian process (GP) models using noisy, incomplete, sparse, heterogeneous and unevenly-sampled clinical data, including both physiological signals and clinical notes. The learned multi-task GP (MTGP) hyperparameters are then used to assess and forecast patient acuity. Experiments were conducted with two real clinical data sets acquired from ICU patients: firstly, estimating cerebrovascular pressure reactivity, an important indicator of secondary damage for traumatic brain injury patients, by learning the interactions between intracranial pressure and mean arterial blood pressure signals, and secondly, mortality prediction using clinical progress notes. In both cases, MTGPs provided improved results: an MTGP model provided better results than single-task GP models for signal interpolation and forecasting (0.91 vs 0.69 RMSE), and the use of MTGP hyperparameters obtained improved results when used as additional classification features (0.812 vs 0.788 AUC).