Collaborating Authors

APObind: A Dataset of Ligand Unbound Protein Conformations for Machine Learning Applications in De Novo Drug Design Artificial Intelligence

A drawback of these methods that perform important tasks related to methods however is that, they tend not to generalise well drug design such as receptor binding site detection, to data that does not resemble the data distribution used for small molecule docking and binding affinity training. The viability of such models therefore depend on prediction. However, these methods are usually well curated training data that translates well into real world trained on only ligand bound (or holo) conformations applications. of the protein and therefore are not guaranteed to perform well when the protein structure Deep Learning models pertaining to SBDD workflows are is in its native unbound conformation (or apo), usually trained on datasets containing 3D structures of which is usually the conformation available for protein-ligand complexes (Batool et al., 2019). PDBbind a newly identified receptor. A primary reason (Wang et al., 2005) is a predominantly used dataset that provides for this is that the local structure of the binding experimental binding affinity values for protein-ligand site usually changes upon ligand binding. To facilitate co-crystal structures present in the Protein Data Bank (PDB) solutions for this problem, we propose a (Berman et al., 2000). Deep learning architectures usually dataset called APObind that aims to provide apo use voxelized (Jiménez et al., 2018) or graph like representations conformations of proteins present in the PDBbind (Son & Kim, 2021) of the 3D structures present in dataset, a popular dataset used in drug design. Furthermore, PDBbind for computation to get benchmark performances.

A new protein binding pocket similarity measure based on comparison of 3D atom clouds: application to ligand prediction Machine Learning

Motivation: Prediction of ligands for proteins of known 3D structure is important to understand structure-function relationship, predict molecular function, or design new drugs. Results: We explore a new approach for ligand prediction in which binding pockets are represented by atom clouds. Each target pocket is compared to an ensemble of pockets of known ligands. Pockets are aligned in 3D space with further use of convolution kernels between clouds of points. Performance of the new method for ligand prediction is compared to those of other available measures and to docking programs. We discuss two criteria to compare the quality of similarity measures: area under ROC curve (AUC) and classification based scores. We show that the latter is better suited to evaluate the methods with respect to ligand prediction. Our results on existing and new benchmarks indicate that the new method outperforms other approaches, including docking. Availability: The new method is available at Contact:

Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks Machine Learning

Protein-ligand interactions (PLIs) are fundamental to biochemical research and their identification is crucial for estimating biophysical and biochemical properties for rational therapeutic design. Currently, experimental characterization of these properties is the most accurate method, however, this is very time-consuming and labor-intensive. A number of computational methods have been developed in this context but most of the existing PLI prediction heavily depends on 2D protein sequence data. Here, we present a novel parallel graph neural network (GNN) to integrate knowledge representation and reasoning for PLI prediction to perform deep learning guided by expert knowledge and informed by 3D structural data. We develop two distinct GNN architectures, GNNF is the base implementation that employs distinct featurization to enhance domain-awareness, while GNNP is a novel implementation that can predict with no prior knowledge of the intermolecular interactions. The comprehensive evaluation demonstrated that GNN can successfully capture the binary interactions between ligand and proteins 3D structure with 0.979 test accuracy for GNNF and 0.958 for GNNP for predicting activity of a protein-ligand complex. These models are further adapted for regression tasks to predict experimental binding affinities and pIC50 is crucial for drugs potency and efficacy. We achieve a Pearson correlation coefficient of 0.66 and 0.65 on experimental affinity and 0.50 and 0.51 on pIC50 with GNNF and GNNP, respectively, outperforming similar 2D sequence-based models. Our method can serve as an interpretable and explainable artificial intelligence (AI) tool for predicted activity, potency, and biophysical properties of lead candidates. To this end, we show the utility of GNNP on SARS-Cov-2 protein targets by screening a large compound library and comparing our prediction with the experimentally measured data.

Conformational ensembles


Structural Biology G–protein coupled receptors (GPCRs) mediate signals across the cell membrane that regulate diverse biological processes and thus are important drug targets. Signaling involves ligand binding, GPCR binding to heterotrimeric G protein, nucleotide exchange, and dissociation of the G protein. High-resolution snapshots of GPCRs, alone and in complexes, provide insight into the activation mechanism but do not reveal the full landscape of conformations. Huang et al. used fluorine nuclear magnetic resonance spectroscopy to monitor the key states of a prototypical GPCR, the adenosine A2A receptor, in a lipid bilayer as a function of ligand, G protein, and nucleotide. The authors identified a conformational ensemble composed of at least five key functional states involving two inactive states and three active states. Ligands and nucleotide shift the distribution of conformations. Understanding the conformational ensemble may inform drug design. Cell 10.1016/j.cell.2021.02.041 (2021).

Fooling the classifier: Ligand antagonism and adversarial examples Machine Learning

Machine learning algorithms are sensitive to so-called adversarial perturbations. This is reminiscent of cellular decision-making where antagonist ligands may prevent correct signaling, like during the early immune response. We draw a formal analogy between neural networks used in machine learning and the general class of adaptive proofreading networks. We then apply simple adversarial strategies from machine learning to models of ligand discrimination. We show how kinetic proofreading leads to "boundary tilting" and identify three types of perturbation (adversarial, non adversarial and ambiguous). We then use a gradient-descent approach to compare different adaptive proofreading models, and we reveal the existence of two qualitatively different regimes characterized by the presence or absence of a critical point. These regimes are reminiscent of the "feature-to-prototype" transition identified in machine learning, corresponding to two strategies in ligand antagonism (broad vs. specialized). Overall, our work connects evolved cellular decision-making to classification in machine learning, showing that behaviours close to the decision boundary can be understood through the same mechanisms.