Goto

Collaborating Authors

 Xiao, Cao


Change Matters: Medication Change Prediction with Recurrent Residual Networks

arXiv.org Artificial Intelligence

Deep learning is revolutionizing predictive healthcare, including recommending medications to patients with complex health conditions. Existing approaches focus on predicting all medications for the current visit, which often overlaps with medications from previous visits. A more clinically relevant task is to identify medication changes. In this paper, we propose a new recurrent residual network, named MICRON, for medication change prediction. MICRON takes the changes in patient health records as input and learns to update a hidden medication vector and the medication set recurrently with a reconstruction design. The medication vector is like the memory cell that encodes longitudinal information of medications. Unlike traditional methods that require the entire patient history for prediction, MICRON has a residual-based inference that allows for sequential updating based only on new patient features (e.g., new diagnoses in the recent visit) more efficiently. We evaluated MICRON on real inpatient and outpatient datasets. MICRON achieves 3.5% and 7.8% relative improvements over the best baseline in F1 score, respectively. MICRON also requires fewer parameters, which significantly reduces the training time to 38.3s per epoch with 1.5x speed-up.


SCRIB: Set-classifier with Class-specific Risk Bounds for Blackbox Models

arXiv.org Machine Learning

Despite deep learning (DL) success in classification problems, DL classifiers do not provide a sound mechanism to decide when to refrain from predicting. Recent works tried to control the overall prediction risk with classification with rejection options. However, existing works overlook the different significance of different classes. We introduce Set-classifier with Class-specific RIsk Bounds (SCRIB) to tackle this problem, assigning multiple labels to each example. Given the output of a black-box model on the validation set, SCRIB constructs a set-classifier that controls the class-specific prediction risks with a theoretical guarantee. The key idea is to reject when the set classifier returns more than one label. We validated SCRIB on several medical applications, including sleep staging on electroencephalogram (EEG) data, X-ray COVID image classification, and atrial fibrillation detection based on electrocardiogram (ECG) data. SCRIB obtained desirable class-specific risks, which are 35\%-88\% closer to the target risks than baseline methods.


HINT: Hierarchical Interaction Network for Trial Outcome Prediction Leveraging Web Data

arXiv.org Artificial Intelligence

Clinical trials are crucial for drug development but are time consuming, expensive, and often burdensome on patients. More importantly, clinical trials face uncertain outcomes due to issues with efficacy, safety, or problems with patient recruitment. If we were better at predicting the results of clinical trials, we could avoid having to run trials that will inevitably fail more resources could be devoted to trials that are likely to succeed. In this paper, we propose Hierarchical INteraction Network (HINT) for more general, clinical trial outcome predictions for all diseases based on a comprehensive and diverse set of web data including molecule information of the drugs, target disease information, trial protocol and biomedical knowledge. HINT first encode these multi-modal data into latent embeddings, where an imputation module is designed to handle missing data. Next, these embeddings will be fed into the knowledge embedding module to generate knowledge embeddings that are pretrained using external knowledge on pharmaco-kinetic properties and trial risk from the web. Then the interaction graph module will connect all the embedding via domain knowledge to fully capture various trial components and their complex relations as well as their influences on trial outcomes. Finally, HINT learns a dynamic attentive graph neural network to predict trial outcome. Comprehensive experimental results show that HINT achieves strong predictive performance, obtaining 0.772, 0.607, 0.623, 0.703 on PR-AUC for Phase I, II, III, and indication outcome prediction, respectively. It also consistently outperforms the best baseline method by up to 12.4\% on PR-AUC.


Fast Graph Attention Networks Using Effective Resistance Based Graph Sparsification

arXiv.org Machine Learning

The attention mechanism has demonstrated superior performance for inference over nodes in graph neural networks (GNNs), however, they result in a high computational burden during both training and inference. We propose FastGAT, a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph. This results in a per-epoch time that is almost linear in the number of graph nodes as opposed to quadratic. We theoretically prove that spectral sparsification preserves the features computed by the GAT model, thereby justifying our FastGAT algorithm. We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks under both inductive and transductive settings. FastGAT can dramatically reduce (up to 10x) the computational time and memory requirements, allowing the usage of attention based GNNs on large graphs. Graphs are efficient representations of pairwise relations, with many real-world applications including product co-purchasing network ((McAuley et al., 2015)), coauthor network ((Hamilton et al., 2017b)), etc. Graph neural networks (GNN) have become popular as a tool for inference from graph based data. By leveraging the geometric structure of the graph, GNNs learn improved representations of the graph nodes and edges that can lead to better performance in various inference tasks ((Kipf & Welling, 2016; Hamilton et al., 2017a; Veliฤkoviฤ‡ et al., 2018)). More recently, the attention mechanism has demonstrated superior performance for inference over nodes in GNNs ((Veliฤkoviฤ‡ et al., 2018; Xinyi & Chen, 2019; Thekumparampil et al., 2018; Lee et al., 2020; Bianchi et al., 2019; Knyazev et al., 2019)). However, attention based GNNs suffer from huge computational cost. This may hinder the applicability of the attention mechanism to large graphs.


MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization

arXiv.org Artificial Intelligence

Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. To address such challenges, we propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution. MIMOSA first pretrains two property agnostic graph neural networks (GNNs) for molecule topology and substructure-type prediction, where a substructure can be either atom or single ring. For each iteration, MIMOSA uses the GNNs' prediction and employs three basic substructure operations (add, replace, delete) to generate new molecules and associated weights. The weights can encode multiple constraints including similarity and drug property constraints, upon which we select promising molecules for next iteration. MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.6% relative improvement over the best baseline in terms of success rate.


COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching

arXiv.org Artificial Intelligence

Clinical trials play important roles in drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. The availability of massive electronic health records (EHR) data and trial eligibility criteria (EC) bring a new opportunity to data driven patient recruitment. One key task named patient-trial matching is to find qualified patients for clinical trials given structured EHR and unstructured EC text (both inclusion and exclusion criteria). How to match complex EC text with longitudinal patient EHRs? How to embed many-to-many relationships between patients and trials? How to explicitly handle the difference between inclusion and exclusion criteria? In this paper, we proposed CrOss-Modal PseudO-SiamEse network (COMPOSE) to address these challenges for patient-trial matching. One path of the network encodes EC using convolutional highway network. The other path processes EHR with multi-granularity memory network that encodes structured patient records into multiple levels based on medical ontology. Using the EC embedding as query, COMPOSE performs attentional record alignment and thus enables dynamic patient-trial matching. COMPOSE also introduces a composite loss term to maximize the similarity between patient records and inclusion criteria while minimize the similarity to the exclusion criteria. Experiment results show COMPOSE can reach 98.0% AUC on patient-criteria matching and 83.7% accuracy on patient-trial matching, which leads 24.3% improvement over the best baseline on real-world patient-trial matching tasks.


CHEER: Rich Model Helps Poor Model via Knowledge Infusion

arXiv.org Machine Learning

There is a growing interest in applying deep learning (DL) to healthcare, driven by the availability of data with multiple feature channels in rich-data environments (e.g., intensive care units). However, in many other practical situations, we can only access data with much fewer feature channels in a poor-data environments (e.g., at home), which often results in predictive models with poor performance. How can we boost the performance of models learned from such poor-data environment by leveraging knowledge extracted from existing models trained using rich data in a related environment? To address this question, we develop a knowledge infusion framework named CHEER that can succinctly summarize such rich model into transferable representations, which can be incorporated into the poor model to improve its performance. The infused model is analyzed theoretically and evaluated empirically on several datasets. Our empirical results showed that CHEER outperformed baselines by 5.60% to 46.80% in terms of the macro-F1 score on multiple physiological datasets.


SLEEPER: interpretable Sleep staging via Prototypes from Expert Rules

arXiv.org Machine Learning

Sleep staging is a crucial task for diagnosing sleep disorders. It is tedious and complex as it can take a trained expert several hours to annotate just one patient's polysomnogram (PSG) from a single night. Although deep learning models have demonstrated state-of-the-art performance in automating sleep staging, interpretability which defines other desiderata, has largely remained unexplored. In this study, we propose Sleep staging via Prototypes from Expert Rules (SLEEPER), which combines deep learning models with expert defined rules using a prototype learning framework to generate simple interpretable models. In particular, SLEEPER utilizes sleep scoring rules and expert defined features to derive prototypes which are embeddings of PSG data fragments via convolutional neural networks. The final models are simple interpretable models like a shallow decision tree defined over those phenotypes. We evaluated SLEEPER using two PSG datasets collected from sleep studies and demonstrated that SLEEPER could provide accurate sleep stage classification comparable to human experts and deep neural networks with about 85% ROC-AUC and .7 kappa.


GENN: Predicting Correlated Drug-drug Interactions with Graph Energy Neural Networks

arXiv.org Machine Learning

Gaining more comprehensive knowledge about drug-drug interactions (DDIs) is one of the most important tasks in drug development and medical practice. Recently graph neural networks have achieved great success in this task by modeling drugs as nodes and drug-drug interactions as links and casting DDI predictions as link prediction problems. However, correlations between link labels (e.g., DDI types) were rarely considered in existing works. We propose the graph energy neural network ( GENN) to explicitly model link type correlations. We formulate the DDI prediction task as a structure prediction problem, and introduce a new energy-based model where the energy function is defined by graph neural networks. Experiments on two real world DDI datasets demonstrated that GENN is superior to many baselines without consideration of link type correlations and achieved 13. 77% and 5.01% PR-AUC improvement on the two datasets, respectively. We also present a case study in which GENN can better capture meaningful DDI correlations compared with baseline models. The use of drug combinations is common and often necessary for treating patients with complex diseases.


Rare Disease Detection by Sequence Modeling with Generative Adversarial Networks

arXiv.org Machine Learning

Rare diseases affecting 350 million individuals are commonly associated with delay in diagnosis or misdiagnosis. To improve those patients' outcome, rare disease detection is an important task for identifying patients with rare conditions based on longitudinal medical claims. In this paper, we present a deep learning method for detecting patients with exocrine pancreatic insufficiency (EPI) (a rare disease). The contribution includes 1) a large longitudinal study using 7 years medical claims from 1.8 million patients including 29,149 EPI patients, 2) a new deep learning model using generative adversarial networks (GANs) to boost rare disease class, and also leveraging recurrent neural networks to model patient sequence data, 3) an accurate prediction with 0.56 PR-AUC which outperformed benchmark models in terms of precision and recall.