Goto

Collaborating Authors

 dta prediction


Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

Pei, Qizhi, Wu, Lijun, He, Zhenyu, Zhu, Jinhua, Xia, Yingce, Xie, Shufang, Yan, Rui

arXiv.org Artificial Intelligence

Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose $k$NN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power of the DTA model with no or negligible cost. Different from existing methods, we introduce two neighbor aggregation ways from both embedding space and label space that are integrated into a unified framework. Specifically, we propose a \emph{label aggregation} with \emph{pair-wise retrieval} and a \emph{representation aggregation} with \emph{point-wise retrieval} of the nearest neighbors. This method executes in the inference phase and can efficiently boost the DTA prediction performance with no training cost. In addition, we propose an extension, Ada-$k$NN-DTA, an instance-wise and adaptive aggregation with lightweight learning. Results on four benchmark datasets show that $k$NN-DTA brings significant improvements, outperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB IC$_{50}$ and $K_i$ testbeds, $k$NN-DTA obtains new records of RMSE $\bf{0.684}$ and $\bf{0.750}$. The extended Ada-$k$NN-DTA further improves the performance to be $\bf{0.675}$ and $\bf{0.735}$ RMSE. These results strongly prove the effectiveness of our method. Results in other settings and comprehensive studies/analyses also show the great potential of our $k$NN-DTA approach.


SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction

Pei, Qizhi, Wu, Lijun, Zhu, Jinhua, Xia, Yingce, Xie, Shufang, Qin, Tao, Liu, Haiguang, Liu, Tie-Yan, Yan, Rui

arXiv.org Artificial Intelligence

Accurate prediction of Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities. While wet experiments remain the most reliable method, they are time-consuming and resource-intensive, resulting in limited data availability that poses challenges for deep learning approaches. Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue. To overcome this challenge, we present the SSM-DTA framework, which incorporates three simple yet highly effective strategies: (1) A multi-task training approach that combines DTA prediction with masked language modeling (MLM) using paired drug-target data. (2) A semi-supervised training method that leverages large-scale unpaired molecules and proteins to enhance drug and target representations. This approach differs from previous methods that only employed molecules or proteins in pre-training. (3) The integration of a lightweight cross-attention module to improve the interaction between drugs and targets, further enhancing prediction accuracy. Through extensive experiments on benchmark datasets such as BindingDB, DAVIS, and KIBA, we demonstrate the superior performance of our framework. Additionally, we conduct case studies on specific drug-target binding activities, virtual screening experiments, drug feature visualizations, and real-world applications, all of which showcase the significant potential of our work. In conclusion, our proposed SSM-DTA framework addresses the data limitation challenge in DTA prediction and yields promising results, paving the way for more efficient and accurate drug discovery processes. Our code is available at $\href{https://github.com/QizhiPei/SSM-DTA}{Github}$.


A biologically-inspired multi-modal evaluation of molecular generative machine learning

Vinogradova, Elizaveta, Artykbayev, Abay, Amanatay, Alisher, Karatayev, Mukhamejan, Mametkulov, Maxim, Li, Albina, Suleimenov, Anuar, Salimzhanov, Abylay, Pats, Karina, Zhumagambetov, Rustam, Molnár, Ferdinand, Peshkov, Vsevolod, Fazli, Siamac

arXiv.org Artificial Intelligence

While generative models have recently become ubiquitous in many scientific areas, less attention has been paid to their evaluation. For molecular generative models, the state-of-the-art examines their output in isolation or in relation to its input. However, their biological and functional properties, such as ligand-target interaction is not being addressed. In this study, a novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed. Specifically, three diverse reference datasets are designed and a set of metrics are introduced which are directly relevant to the drug discovery process. In particular we propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs. While all three metrics show consistent results across the tested generative models, a more detailed comparison of drug-target affinity binding and molecular docking scores revealed that unimodal predictiors can lead to erroneous conclusions about target binding on a molecular level and a multi-modal approach is thus preferrable. The key advantage of this framework is that it incorporates prior physico-chemical domain knowledge into the benchmarking process by focusing explicitly on ligand-target interactions and thus creating a highly efficient tool not only for evaluating molecular generative outputs in particular, but also for enriching the drug discovery process in general.