Drug response prediction by ensemble learning and drug-induced gene expression signatures

arXiv.org Machine Learning

Chemotherapeutic response of cancer cells to a given compound is one of the most fundamental information one requires to design anti-cancer drugs. Recent advances in producing large drug screens against cancer cell lines provided an opportunity to apply machine learning methods for this purpose. In addition to cytotoxicity databases, considerable amount of drug-induced gene expression data has also become publicly available. Following this, several methods that exploit omics data were proposed to predict drug activity on cancer cells. However, due to the complexity of cancer drug mechanisms, none of the existing methods are perfect. One possible direction, therefore, is to combine the strengths of both the methods and the databases for improved performance. We demonstrate that integrating a large number of predictions by the proposed method improves the performance for this task. The predictors in the ensemble differ in several aspects such as the method itself, the number of tasks method considers (multi-task vs. single-task) and the subset of data considered (sub-sampling). We show that all these different aspects contribute to the success of the final ensemble. In addition, we attempt to use the drug screen data together with two novel signatures produced from the drug-induced gene expression profiles of cancer cell lines. Finally, we evaluate the method predictions by in vitro experiments in addition to the tests on data sets.The predictions of the methods, the signatures and the software are available from http://mtan.etu.edu.tr/drug-response-prediction/.


An Integrated Transfer Learning and Multitask Learning Approach for Pharmacokinetic Parameter Prediction

arXiv.org Machine Learning

Background: Pharmacokinetic evaluation is one of the key processes in drug discovery and development. However, current absorption, distribution, metabolism, excretion prediction models still have limited accuracy. Aim: This study aims to construct an integrated transfer learning and multitask learning approach for developing quantitative structure-activity relationship models to predict four human pharmacokinetic parameters. Methods: A pharmacokinetic dataset included 1104 U.S. FDA approved small molecule drugs. The dataset included four human pharmacokinetic parameter subsets (oral bioavailability, plasma protein binding rate, apparent volume of distribution at steady-state and elimination half-life). The pre-trained model was trained on over 30 million bioactivity data. An integrated transfer learning and multitask learning approach was established to enhance the model generalization. Results: The pharmacokinetic dataset was split into three parts (60:20:20) for training, validation and test by the improved Maximum Dissimilarity algorithm with the representative initial set selection algorithm and the weighted distance function. The multitask learning techniques enhanced the model predictive ability. The integrated transfer learning and multitask learning model demonstrated the best accuracies, because deep neural networks have the general feature extraction ability, transfer learning and multitask learning improved the model generalization. Conclusions: The integrated transfer learning and multitask learning approach with the improved dataset splitting algorithm was firstly introduced to predict the pharmacokinetic parameters. This method can be further employed in drug discovery and development.


Towards meta-learning for multi-target regression problems

arXiv.org Machine Learning

Several multi-target regression methods were devel-oped in the last years aiming at improving predictive performanceby exploring inter-target correlation within the problem. However, none of these methods outperforms the others for all problems. This motivates the development of automatic approachesto recommend the most suitable multi-target regression method. In this paper, we propose a meta-learning system to recommend the best predictive method for a given multi-target regression problem. We performed experiments with a meta-dataset generated by a total of 648 synthetic datasets. These datasets were created to explore distinct inter-targets characteristics toward recommending the most promising method. In experiments, we evaluated four different algorithms with different biases as meta-learners. Our meta-dataset is composed of 58 meta-features, based on: statistical information, correlation characteristics, linear landmarking, from the distribution and smoothness of the data, and has four different meta-labels. Results showed that induced meta-models were able to recommend the best methodfor different base level datasets with a balanced accuracy superior to 70% using a Random Forest meta-model, which statistically outperformed the meta-learning baselines.


Prediction of Drug Synergy by Ensemble Learning

arXiv.org Machine Learning

One of the promising methods for the treatment of complex diseases such as cancer is combinational therapy. Due to the combinatorial complexity, machine learning models can be useful in this field, where significant improvements have recently been achieved in determination of synergistic combinations. In this study, we investigate the effectiveness of different compound representations in predicting the drug synergy. On a large drug combination screen dataset, we first demonstrate the use of a promising representation that has not been used for this problem before, then we propose an ensemble on representation-model combinations that outperform each of the baseline models. 1 Scientific Background A drug combination is called synergistic if the effect of the drug combination on the reference cell is greater than the total effect taken from the administration of the individual drugs. If the opposite situation is observed, the drug combination is called antagonistic .


Meta-Learning: A Survey

arXiv.org Machine Learning

Meta-learning, or learning to learn, is the science of systematically observing how different machine learning approaches perform on a wide range of learning tasks, and then learning from this experience, or meta-data, to learn new tasks much faster than otherwise possible. Not only does this dramatically speed up and improve the design of machine learning pipelines or neural architectures, it also allows us to replace hand-engineered algorithms with novel approaches learned in a data-driven way. In this chapter, we provide an overview of the state of the art in this fascinating and continuously evolving field.