Xiao, Cao
Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders
Ma, Tengfei, Xiao, Cao, Zhou, Jiayu, Wang, Fei
Drug similarity has been studied to support downstream clinical tasks such as inferring novel properties of drugs (e.g. side effects, indications, interactions) from known properties. The growing availability of new types of drug features brings the opportunity of learning a more comprehensive and accurate drug similarity that represents the full spectrum of underlying drug relations. However, it is challenging to integrate these heterogeneous, noisy, nonlinear-related information to learn accurate similarity measures especially when labels are scarce. Moreover, there is a trade-off between accuracy and interpretability. In this paper, we propose to learn accurate and interpretable similarity measures from multiple types of drug features. In particular, we model the integration using multi-view graph auto-encoders, and add attentive mechanism to determine the weights for each view with respect to corresponding tasks and features for better interpretability. Our model has flexible design for both semi-supervised and unsupervised settings. Experimental results demonstrated significant predictive accuracy improvement. Case studies also showed better model capacity (e.g. embed node features) and interpretability.
Multitask Dyadic Prediction and Its Application in Prediction of Adverse Drug-Drug Interaction
Jin, Bo (Dalian University of Technology) | Yang, Haoyu (Dalian University of Technology) | Xiao, Cao (IBM T.J.Watson Research Center) | Zhang, Ping (IBM T.J.Watson Research Center) | Wei, Xiaopeng (Dalian University of Technology) | Wang, Fei (Cornell University)
Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality around the world. Identifying potential DDIs during the drug design process is critical in guiding targeted clinical drug safety testing. Although detection of adverse DDIs is conducted during Phase IV clinical trials, there are still a large number of new DDIs founded by accidents after the drugs were put on market. With the arrival of big data era, more and more pharmaceutical research and development data are becoming available, which provides an invaluable resource for digging insights that can potentially be leveraged in early prediction of DDIs. Many computational approaches have been proposed in recent years for DDI prediction. However, most of them focused on binary prediction (with or without DDI), despite the fact that each DDI is associated with a different type. Predicting the actual DDI type will help us better understand the DDI mechanism and identify proper ways to prevent it. In this paper, we formulate the DDI type prediction problem as a multitask dyadic regression problem, where the prediction of each specific DDI type is treated as a task. Compared with conventional matrix completion approaches which can only impute the missing entries in the DDI matrix, our approach can directly regress those dyadic relationships (DDIs) and thus can be extend to new drugs more easily. We developed an effective proximal gradient method to solve the problem. Evaluation on real world datasets is presented to demonstrate the effectiveness of the proposed approach.
Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation
Xiao, Cao (IBM T.J.Watson Research Center) | Zhang, Ping (IBM T.J.Watson Research Center) | Chaovalitwongse, W. Art (University of Arkansas) | Hu, Jianying (IBM T.J.Watson Research Center) | Wang, Fei (Cornell University)
Adverse drug reaction (ADR) is a major burden for patients and healthcare industry. It usually causes preventable hospitalizations and deaths, while associated with a huge amount of cost. Traditional preclinical in vitro safety profiling and clinical safety trials are restricted in terms of small scale, long duration, huge financial costs and limited statistical signifi- cance. The availability of large amounts of drug and ADR data potentially allows ADR predictions during the drugs’ early preclinical stage with data analytics methods to inform more targeted clinical safety tests. Despite their initial success, existing methods have trade-offs among interpretability, predictive power and efficiency. This urges us to explore methods that could have all these strengths and provide practical solutions for real world ADR predictions. We cast the ADR-drug relation structure into a three-layer hierarchical Bayesian model. We interpret each ADR as a symbolic word and apply latent Dirichlet allocation (LDA) to learn topics that may represent certain biochemical mechanism that relates ADRs with drug structures. Based on LDA, we designed an equivalent regularization term to incorporate the hierarchical ADR domain knowledge. Finally, we developed a mixed input model leveraging a fast collapsed Gibbs sampling method that the complexity of each iteration of Gibbs sampling proportional only to the number of positive ADRs. Experiments on real world data show our models achieved higher prediction accuracy and shorter running time than the state-of-the-art alternatives.
An Efficient Time Series Subsequence Pattern Mining and Prediction Framework with an Application to Respiratory Motion Prediction
Wang, Shouyi (University of Texas at Arlington) | Kam, Kinming (University of Texas at Arlington) | Xiao, Cao (University of Washington) | Bowen, Stephen (University of Washington) | Chaovalitwongse, Wanpracha Art (University of Washington)
Traditional time series analysis methods are limited on some complex real-world time series data. Respiratory motion prediction is one of such challenging problems. The memory-based nearest neighbor approaches haveshown potentials in predicting complex nonlinear time series compared to many traditional parametric prediction models. However, the massive time series subsequences representation, the similarity distance measures, the number of nearest neighbors, and the ensemble functions create challenges as well as limit the performance of nearest neighbor approaches in complex time series prediction. To address these problems, we propose a flexible time series pattern representation and selection framework, called the orthogonalpolynomial-based variant-nearest-neighbor (OPVNN) approach. For the respiratory motion prediction problem, the proposed approach achieved the highest and most robust prediction performance compared to the state-of-the-art time series prediction methods. With a solid mathematical and theoretical foundation in orthogonal polynomials, the proposed time series representation, subsequence pattern mining and prediction framework has a great potential to benefit those industry and medical applications that need to handle highly nonlinear and complex time series data streams, such as quasi-periodic ones.