Goto

Collaborating Authors

 De Fabritiis, Gianni


Navigating protein landscapes with a machine-learned transferable coarse-grained model

arXiv.org Machine Learning

The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-atom protein simulations, we here develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences not used during model parametrization. We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins while it is several orders of magnitude faster than an all-atom model. This showcases the feasibility of a universal and computationally efficient machine-learned CG model for proteins.


NNP/MM: Accelerating molecular dynamics simulations with machine learning potentials and molecular mechanic

arXiv.org Artificial Intelligence

Machine learning potentials have emerged as a means to enhance the accuracy of biomolecular simulations. However, their application is constrained by the significant computational cost arising from the vast number of parameters compared to traditional molecular mechanics. To tackle this issue, we introduce an optimized implementation of the hybrid method (NNP/MM), which combines neural network potentials (NNP) and molecular mechanics (MM). This approach models a portion of the system, such as a small molecule, using NNP while employing MM for the remaining system to boost efficiency. By conducting molecular dynamics (MD) simulations on various protein-ligand complexes and metadynamics (MTD) simulations on a ligand, we showcase the capabilities of our implementation of NNP/MM. It has enabled us to increase the simulation speed by 5 times and achieve a combined sampling of one microsecond for each complex, marking the longest simulations ever reported for this class of simulation.


Machine Learning Small Molecule Properties in Drug Discovery

arXiv.org Artificial Intelligence

Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. We highlight also challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explore briefly possible multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed. Overall, this review provides insights into the landscape of ML models for small molecule property predictions in drug discovery. So far, there are multiple diverse approaches, but their performances are often comparable. Neural networks, while more flexible, do not always outperform simpler models. This shows that the availability of high-quality training data remains crucial for training accurate models and there is a need for standardized benchmarks, additional performance metrics, and best practices to enable richer comparisons between the different techniques and models that can shed a better light on the differences between the many techniques.


Binding-and-folding recognition of an intrinsically disordered protein using online learning molecular dynamics

arXiv.org Artificial Intelligence

Intrinsically disordered proteins participate in many biological processes by folding upon binding with other proteins. However, coupled folding and binding processes are not well understood from an atomistic point of view. One of the main questions is whether folding occurs prior to or after binding. Here we use a novel unbiased high-throughput adaptive sampling approach to reconstruct the binding and folding between the disordered transactivation domain of \mbox{c-Myb} and the KIX domain of the CREB-binding protein. The reconstructed long-term dynamical process highlights the binding of a short stretch of amino acids on \mbox{c-Myb} as a folded $\alpha$-helix. Leucine residues, specially Leu298 to Leu302, establish initial native contacts that prime the binding and folding of the rest of the peptide, with a mixture of conformational selection on the N-terminal region with an induced fit of the C-terminal.


Machine Learning Coarse-Grained Potentials of Protein Thermodynamics

arXiv.org Artificial Intelligence

A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.


SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials

arXiv.org Artificial Intelligence

Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.


TorchMD-NET: Equivariant Transformers for Neural Network based Molecular Potentials

arXiv.org Artificial Intelligence

The prediction of quantum mechanical properties is historically plagued by a trade-off between accuracy and speed. Machine learning potentials have previously shown great success in this domain, reaching increasingly better accuracy while maintaining computational efficiency comparable with classical force fields. In this work we propose TorchMD-NET, a novel equivariant transformer (ET) architecture, outperforming state-of-the-art on MD17, ANI-1, and many QM9 targets in both accuracy and computational efficiency. Through an extensive attention weight analysis, we gain valuable insights into the black box predictor and show differences in the learned representation of conformers versus conformations sampled from molecular dynamics or normal modes. Furthermore, we highlight the importance of datasets including off-equilibrium conformations for the evaluation of molecular potentials.


TorchMD: A deep learning framework for molecular simulations

arXiv.org Artificial Intelligence

Molecular dynamics simulations provide a mechanistic description of molecules by relying on empirical potentials. The quality and transferability of such potentials can be improved leveraging data-driven models derived with machine learning approaches. Here, we present TorchMD, a framework for molecular simulations with mixed classical and machine learning potentials. All of force computations including bond, angle, dihedral, Lennard-Jones and Coulomb interactions are expressed as PyTorch arrays and operations. Moreover, TorchMD enables learning and simulating neural network potentials. We validate it using standard Amber all-atom simulations, learning an ab-initio potential, performing an end-to-end training and finally learning and simulating a coarse-grained model for protein folding. We believe that TorchMD provides a useful tool-set to support molecular simulations of machine learning potentials. Code and data are freely available at \url{github.com/torchmd}.


Guided Exploration with Proximal Policy Optimization using a Single Demonstration

arXiv.org Artificial Intelligence

Solving sparse reward tasks through exploration is one of the major challenges in deep reinforcement learning, especially in three-dimensional, partially-observable environments. Critically, the algorithm proposed in this article uses a single human demonstration to solve hard-exploration problems. We train an agent on a combination of demonstrations and own experience to solve problems with variable initial conditions. We adapt this idea and integrate it with the proximal policy optimization (PPO). The agent is able to increase its performance and to tackle harder problems by replaying its own past trajectories prioritizing them based on the obtained reward and the maximum value of the trajectory. We compare different variations of this algorithm to behavioral cloning on a set of hard-exploration tasks in the Animal-AI Olympics environment. To the best of our knowledge, learning a task in a three-dimensional environment with comparable difficulty has never been considered before using only one human demonstration.


NAPPO: Modular and scalable reinforcement learning in pytorch

arXiv.org Artificial Intelligence

Reinforcement learning (RL) has been very successful in recent years but, limited by its sample inefficiency, often requires large computational resources. While new methods are being investigated to increase the efficiency of RL algorithms it is critical to enable training at scale, yet using a code-base flexible enough to allow for method experimentation. Here, we present NAPPO, a pytorch-based library for RL which provides scalable proximal policy optimization (PPO) implementations in a simple, modular package. We validate it by replicating previous results on Mujoco and Atari environments. Furthermore, we provide insights on how a variety of distributed training schemes with synchronous and asynchronous communication patterns perform. Finally we showcase NAPPO by obtaining the highest to-date test performance on the Obstacle Tower Unity3D challenge environment. The full source code is available.