Goto

Collaborating Authors

 He, Xinheng


Integrating Protein Dynamics into Structure-Based Drug Design via Full-Atom Stochastic Flows

arXiv.org Artificial Intelligence

The dynamic nature of proteins, influenced by ligand interactions, is essential for comprehending protein function and progressing drug discovery. Traditional structure-based drug design (SBDD) approaches typically target binding sites with rigid structures, limiting their practical application in drug development. While molecular dynamics simulation can theoretically capture all the biologically relevant conformations, the transition rate is dictated by the intrinsic energy barrier between them, making the sampling process computationally expensive. To overcome the aforementioned challenges, we propose to use generative modeling for SBDD considering conformational changes of protein pockets. We curate a dataset of apo and multiple holo states of protein-ligand complexes, simulated by molecular dynamics, and propose a full-atom flow model (and a stochastic version), named DynamicFlow, that learns to transform apo pockets and noisy ligands into holo pockets and corresponding 3D ligand molecules. Additionally, the resultant holo-like states provide superior inputs for traditional SBDD approaches, playing a significant role in practical drug discovery. Modern deep learning is advancing several areas within drug discovery. Notably, among these, structure-based drug design (SBDD) (Anderson, 2003) emerges as a particularly significant and challenging domain. SBDD aims to discover drug-like ligand molecules specifically tailored to target binding sites. However, the complexity of chemical space and the dynamic nature of molecule conformations make traditional methods such as high throughput and virtual screenings inefficient. Additionally, relying on compound databases limits the diversity of identified molecules. Thus, deep generative models, such as autoregressive models (Luo et al., 2021; Peng et al., 2022) and diffusion models (Guan et al., 2023; Schneuing et al., 2022), have been introduced as a tool for de novo 3D ligand molecule design based on binding pockets, significantly transforming research paradigms. However, most SBDD methods based on deep generative models assume that proteins are rigid (Peng et al., 2022; Guan et al., 2024). However, the dynamic behavior of proteins is crucial for practical drug discovery (Karelina et al., 2023; Boehr et al., 2009). Thermodynamic fluctuations result in proteins existing as an ensemble of various conformational states, and such states may interact with different drug molecules. During binding, the protein's structure may undergo fine-tuning, adopting different conformations to optimize its interaction with the drug, a phenomenon referred to as induced fit (Sherman et al., 2006).


Improving AlphaFlow for Efficient Protein Ensembles Generation

arXiv.org Artificial Intelligence

Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties. AlphaFlow stands out as a sequence-conditioned generative model that introduces flexibility into structure prediction models by fine-tuning AlphaFold under the flow-matching framework. Despite the advantages of efficient sampling afforded by flow-matching, AlphaFlow still requires multiple runs of AlphaFold to finally generate one single conformation. Due to the heavy consumption of AlphaFold, its applicability is limited in sampling larger set of protein ensembles or the longer chains within a constrained timeframe. In this work, we propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation. In contrast to the full fine-tuning on the entire structure, we focus solely on the light-weight structure module to reconstruct the conformation. AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times. The advancement in efficiency showcases the potential of AlphaFlow-Lit in enabling faster and more scalable generation of protein ensembles.


An ensemble of VisNet, Transformer-M, and pretraining models for molecular property prediction in OGB Large-Scale Challenge @ NeurIPS 2022

arXiv.org Artificial Intelligence

In the technical report, we provide our solution for OGB-LSC 2022 Graph Regression Task. The target of this task is to predict the quantum chemical property, HOMO-LUMO gap for a given molecule on PCQM4Mv2 dataset. In the competition, we designed two kinds of models: Transformer-M-ViSNet which is an geometry-enhanced graph neural network for fully connected molecular graphs and Pretrained-3D-ViSNet which is a pretrained ViSNet by distilling geomeotric information from optimized structures. With an ensemble of 22 models, ViSNet Team achieved the MAE of 0.0723 eV on the test-challenge set, dramatically reducing the error by 39.75% compared with the best method in the last year competition.