conformational space
- Energy (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science (0.68)
Active Learning for Machine Learning Driven Molecular Dynamics
Bachelor, Kevin, Murdeshwar, Sanya, Sabo, Daniel, Marinescu, Razvan
Machine-learned coarse-grained (CG) potentials are fast, but degrade over time when simulations reach under-sampled bio-molecular conformations, and generating widespread all-atom (AA) data to combat this is computationally infeasible. We propose a novel active learning (AL) framework for CG neural network potentials in molecular dynamics (MD). Building on the CGSchNet model, our method employs root mean squared deviation (RMSD)-based frame selection from MD simulations in order to generate data on-the-fly by querying an oracle during the training of a neural network potential. This framework preserves CG-level efficiency while correcting the model at precise, RMSD-identified coverage gaps. By training CGSchNet, a coarse-grained neural network potential, we empirically show that our framework explores previously unseen configurations and trains the model on unexplored regions of conformational space. Our active learning framework enables a CGSchNet model trained on the Chignolin protein to achieve a 33.05\% improvement in the Wasserstein-1 (W1) metric in Time-lagged Independent Component Analysis (TICA) space on an in-house benchmark suite.
Chemistry-Enhanced Diffusion-Based Framework for Small-to-Large Molecular Conformation Generation
Zhu, Yifei, Zhang, Jiahui, Peng, Jiawei, Li, Mengge, Xu, Chao, Lan, Zhenggang
Obtaining 3D conformations of realistic polyatomic molecules at the quantum chemistry level remains challenging, and although recent machine learning advances offer promise, predicting large-molecule structures still requires substantial computational effort. Here, we introduce StoL, a diffusion model-based framework that enables rapid and knowledge-free generation of large molecular structures from small-molecule data. Remarkably, StoL assembles molecules in a LEGO-style fashion from scratch, without seeing the target molecules or any structures of comparable size during training. Given a SMILES input, it decomposes the molecule into chemically valid fragments, generates their 3D structures with a diffusion model trained on small molecules, and assembles them into diverse conformations. This fragment-based strategy eliminates the need for large-molecule training data while maintaining high scalability and transferability. By embedding chemical principles into key steps, StoL ensures faster convergence, chemically rational structures, and broad configurational coverage, as confirmed against DFT calculations.
- Asia > China > Guangdong Province > Guangzhou (0.04)
- North America > United States > Connecticut > New Haven County > Wallingford (0.04)
Amortized Inference for Heterogeneous Reconstruction in Cryo-EM
In a single particle cryo-electron microscopy (cryo-EM) experiment, an aqueous solution of purified biomolecules is flash-frozen in a thin layer of vitreous ice and imaged with a transmission electron microscope (Figure 1 (a)). A cryo-EM experiment outputs a large set of unlabeled images, each containing a 2D projection of a unique molecule, whose 3D structure is sampled from some thermodynamic distribution (i.e. a conformation) and viewed from an unknown orientation (i.e. a
- North America > United States (0.14)
- Africa > South Africa > Western Cape > Indian Ocean (0.04)
- Energy (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science (0.68)
MD-LLM-1: A Large Language Model for Molecular Dynamics
Murtada, Mhd Hussein, Brotzakis, Z. Faidon, Vendruscolo, Michele
Molecular dynamics (MD) is a powerful approach for modelling molecular systems, but it remains computationally intensive on spatial and time scales of many macromolecular systems of biological interest. To explore the opportunities offered by deep learning to address this problem, we introduce a Molecular Dynamics Large Language Model (MD-LLM) framework to illustrate how LLMs can be leveraged to learn protein dynamics and discover states not seen in training. By applying MD-LLM-1, the first implementation of this approach, obtained by fine-tuning Mistral 7B, to the T4 lysozyme and Mad2 protein systems, we show that training on one conformational state enables the prediction of other conformational states. These results indicate that MD-LLM-1 can learn the principles for the exploration of the conformational landscapes of proteins, although it is not yet modeling explicitly their thermodynamics and kinetics.
The Latent Road to Atoms: Backmapping Coarse-grained Protein Structures with Latent Diffusion
Han, Xu, Sun, Yuancheng, Chen, Kai, Liu, Kang, Ye, Qiwei
Coarse-grained(CG) molecular dynamics simulations offer computational efficiency for exploring protein conformational ensembles and thermodynamic properties. Though coarse representations enable large-scale simulations across extended temporal and spatial ranges, the sacrifice of atomic-level details limits their utility in tasks such as ligand docking and protein-protein interaction prediction. Backmapping, the process of reconstructing all-atom structures from coarsegrained representations, is crucial for recovering these fine details. While recent machine learning methods have made strides in protein structure generation, challenges persist in reconstructing diverse atomistic conformations that maintain geometric accuracy and chemical validity. In this paper, we present Latent Diffusion Backmapping (LDB), a novel approach leveraging denoising diffusion within latent space to address these challenges. By combining discrete latent encoding with diffusion, LDB bypasses the need for equivariant and internal coordinate manipulation, significantly simplifying the training and sampling processes as well as facilitating better and wider exploration in configuration space. We evaluate LDB's state-of-the-art performance on three distinct protein datasets, demonstrating its ability to efficiently reconstruct structures with high structural accuracy and chemical validity. Moreover, LDB shows exceptional versatility in capturing diverse protein ensembles, highlighting its capability to explore intricate conformational spaces. Coarse-Grained Molecular Dynamics (CG-MD) simulation has become an indispensable tool in computational biology for simulating large biomolecular systems (Das & Baker, 2008; Liwo et al., 2014; Kmiecik et al., 2016; Souza et al., 2021; Majewski et al., 2023; Arts et al., 2023). Through grouping atoms into super-atoms or beads, CG models significantly decrease computational requirements and allow the observation of long-time processes such as folding, aggregation, and selfassembly (Lequieu et al., 2019; Shmilovich et al., 2020; Mohr et al., 2022). However, CG representations inherently sacrifice atomistic details of protein structures, limiting their application to a bunch of important downstream tasks in drug discovery, such as molecular recognition, signaling pathways deciphering, and allosteric sites prediction (Badaczewska-Dawid et al., 2020; Vickery & Stansfeld, 2021; Zambaldi et al., 2024).
Deep Signature: Characterization of Large-Scale Molecular Dynamics
Qin, Tiexin, Zhu, Mengxu, Li, Chunyang, Lyons, Terry, Yan, Hong, Li, Haoliang
Understanding protein dynamics are essential for deciphering protein functional mechanisms and developing molecular therapies. However, the complex highdimensional dynamics and interatomic interactions of biological processes pose significant challenge for existing computational techniques. In this paper, we approach this problem for the first time by introducing Deep Signature, a novel computationally tractable framework that characterizes complex dynamics and interatomic interactions based on their evolving trajectories. Specifically, our approach incorporates soft spectral clustering that locally aggregates cooperative dynamics to reduce the size of the system, as well as signature transform that collects iterated integrals to provide a global characterization of the non-smooth interactive dynamics. Theoretical analysis demonstrates that Deep Signature exhibits several desirable properties, including invariance to translation, near invariance to rotation, equivariance to permutation of atomic coordinates, and invariance under time reparameterization. Furthermore, experimental results on three benchmarks of biological processes verify that our approach can achieve superior performance compared to baseline methods. Biological processes are fundamentally driven by the dynamical changes of macromolecules, particularly proteins and enzymes, within their respective functional conformation spaces. Typical examples of such processes include protein-ligand binding, molecule transport and enzymatic reactions, and modern computational biologists investigate their underlying functional mechanisms by molecular dynamics (MD) simulations (Dror et al., 2012; Lewandowski et al., 2015). Built upon density functional theory (Car & Parrinello, 1985), MD has demonstrated remarkable capability in providing accurate atomic trajectories in three-dimensional (3D) conformational space and consist agreement with experimental observations (Frenkel & Smit, 2023). The computational analysis of MD data has been a subject of extensive research for decades, with the goal of characterizing systems from trajectory information.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- Asia > China > Hong Kong (0.04)
Amortized Inference for Heterogeneous Reconstruction in Cryo-EM
Levy, Axel, Wetzstein, Gordon, Martel, Julien, Poitevin, Frederic, Zhong, Ellen D.
Cryo-electron microscopy (cryo-EM) is an imaging modality that provides unique insights into the dynamics of proteins and other building blocks of life. The algorithmic challenge of jointly estimating the poses, 3D structure, and conformational heterogeneity of a biomolecule from millions of noisy and randomly oriented 2D projections in a computationally efficient manner, however, remains unsolved. Our method, cryoFIRE, performs ab initio heterogeneous reconstruction with unknown poses in an amortized framework, thereby avoiding the computationally expensive step of pose search while enabling the analysis of conformational heterogeneity. Poses and conformation are jointly estimated by an encoder while a physics-based decoder aggregates the images into an implicit neural representation of the conformational space. We show that our method can provide one order of magnitude speedup on datasets containing millions of images without any loss of accuracy. We validate that the joint estimation of poses and conformations can be amortized over the size of the dataset. For the first time, we prove that an amortized method can extract interpretable dynamic information from experimental datasets.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Africa > South Africa > Western Cape > Indian Ocean (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
- Energy (0.68)
Artificial intelligence guided conformational mining of intrinsically disordered proteins - Communications Biology
Artificial intelligence recently achieved the breakthrough of predicting the three-dimensional structures of proteins. The next frontier is presented by intrinsically disordered proteins (IDPs), which, representing 30% to 50% of proteomes, readily access vast conformational space. Molecular dynamics (MD) simulations are promising in sampling IDP conformations, but only at extremely high computational cost. Here, we developed generative autoencoders that learn from short MD simulations and generate full conformational ensembles. An encoder represents IDP conformations as vectors in a reduced-dimensional latent space. The mean vector and covariance matrix of the training dataset are calculated to define a multivariate Gaussian distribution, from which vectors are sampled and fed to a decoder to generate new conformations. The ensembles of generated conformations cover those sampled by long MD simulations and are validated by small-angle X-ray scattering profile and NMR chemical shifts. This work illustrates the vast potential of artificial intelligence in conformational mining of IDPs. Generative autoencoders create full conformational ensembles of intrinsically disordered proteins from short molecular dynamics simulations.
Protein Design by Provable Algorithms
Proteins are a class of large molecules that are involved in the vast majority of biological functions, from cell replication to photosynthesis to cognition. The chemical structure of proteins is very systematic5--they consist of a chain of atoms known as the backbone, which consists of three-atom (nitrogen-carbon-carbon) repeats known as residues, each of which features a sidechain of atoms emanating from the first carbon. In general, there are 20 different options for sidechains, and a residue with a particular type of sidechain is known as an amino acid (so there are also 20 different amino acid types). For billions of years, the process of evolution has optimized the sequence of amino acids that make up naturally occurring proteins to suit the needs of the organisms that make them. So we ask: Can we use computation to design non-naturally occurring proteins that suit our biomedical and industrial needs? This question is a combinatorial optimization problem, because the output of a protein design computation is a sequence of amino acids. Due to the vast diversity of naturally occurring proteins, it is possible--and very useful--to begin a protein design computation with a naturally occurring protein and then to modify it to achieve the desired function. In this article, we focus on protein design algorithms that perform this optimization using detailed modeling of the 3D structure of the protein.5,8 Thus, they will begin with a starting structure, a 3D structure of a (typically naturally occurring) protein we wish to modify. To illustrate this concept, imagine we wish to perform a simple example modification to a protein to make it more stable, so it can still function at higher temperatures.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Oklahoma > Payne County > Cushing (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)