Gastegger, Michael
Scaling up machine learning-based chemical plant simulation: A method for fine-tuning a model to induce stable fixed points
Esders, Malte, Ramirez, Gimmy Alex Fernandez, Gastegger, Michael, Samal, Satya Swarup
Idealized first-principles models of chemical plants can be inaccurate. An alternative is to fit a Machine Learning (ML) model directly to plant sensor data. We use a structured approach: Each unit within the plant gets represented by one ML model. After fitting the models to the data, the models are connected into a flowsheet-like directed graph. We find that for smaller plants, this approach works well, but for larger plants, the complex dynamics arising from large and nested cycles in the flowsheet lead to instabilities in the solver during model initialization. We show that a high accuracy of the single-unit models is not enough: The gradient can point in unexpected directions, which prevents the solver from converging to the correct stationary state. To address this problem, we present a way to fine-tune ML models such that initialization, even with very simple solvers, becomes robust.
Improved motif-scaffolding with SE(3) flow matching
Yim, Jason, Campbell, Andrew, Mathieu, Emile, Foong, Andrew Y. K., Gastegger, Michael, Jiménez-Luna, José, Lewis, Sarah, Satorras, Victor Garcia, Veeling, Bastiaan S., Noé, Frank, Barzilay, Regina, Jaakkola, Tommi S.
Protein design often begins with knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a diverse range of motifs. However, the generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow, and requires no additional training. Both approaches achieve an equivalent or higher success rate than previous state-of-the-art methods, with 2.5 times more structurally diverse scaffolds. Code: https://github.com/ microsoft/frame-flow.
Automatic Identification of Chemical Moieties
Lederer, Jonas, Gastegger, Michael, Schütt, Kristof T., Kampffmeyer, Michael, Müller, Klaus-Robert, Unke, Oliver T.
In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates.
SchNetPack 2.0: A neural network toolbox for atomistic machine learning
Schütt, Kristof T., Hessmann, Stefaan S. P., Gebauer, Niklas W. A., Lederer, Jonas, Gastegger, Michael
SchNetPack is a versatile neural networks toolbox that addresses both the requirements of method development and application of atomistic machine learning. Version 2.0 comes with an improved data pipeline, modules for equivariant neural networks as well as a PyTorch implementation of molecular dynamics. An optional integration with PyTorch Lightning and the Hydra configuration framework powers a flexible command-line interface. This makes SchNetPack 2.0 easily extendable with custom code and ready for complex training task such as generation of 3d molecular structures.
Inverse design of 3d molecular structures with conditional generative neural networks
Gebauer, Niklas W. A., Gastegger, Michael, Hessmann, Stefaan S. P., Müller, Klaus-Robert, Schütt, Kristof T.
The rational design of molecules with desired properties is a long-standing challenge in chemistry. Generative neural networks have emerged as a powerful approach to sample novel molecules from a learned distribution. Here, we propose a conditional generative neural network for 3d molecular structures with specified structural and chemical properties. This approach is agnostic to chemical bonding and enables targeted sampling of novel molecules from conditional distributions, even in domains where reference calculations are sparse. We demonstrate the utility of our method for inverse design by generating molecules with specified composition or motifs, discovering particularly stable molecules, and jointly targeting multiple electronic properties beyond the training regime.
Machine learning of solvent effects on molecular spectra and reactions
Gastegger, Michael, Schütt, Kristof T., Müller, Klaus-Robert
Fast and accurate simulation of complex chemical systems in environments such as solutions is a long standing challenge in theoretical chemistry. In recent years, machine learning has extended the boundaries of quantum chemistry by providing highly accurate and efficient surrogate models of electronic structure theory, which previously have been out of reach for conventional approaches. Those models have long been restricted to closed molecular systems without accounting for environmental influences, such as external electric and magnetic fields or solvent effects. Here, we introduce the deep neural network FieldSchNet for modeling the interaction of molecules with arbitrary external fields. FieldSchNet offers access to a wealth of molecular response properties, enabling it to simulate a wide range of molecular spectra, such as infrared, Raman and nuclear magnetic resonance. Beyond that, it is able to describe implicit and explicit molecular environments, operating as a polarizable continuum model for solvation or in a quantum mechanics / molecular mechanics setup. We employ FieldSchNet to study the influence of solvent effects on molecular spectra and a Claisen rearrangement reaction. Based on these results, we use FieldSchNet to design an external environment capable of lowering the activation barrier of the rearrangement reaction significantly, demonstrating promising venues for inverse chemical design.
Machine Learning Force Fields
Unke, Oliver T., Chmiela, Stefan, Sauceda, Huziel E., Gastegger, Michael, Poltavsky, Igor, Schütt, Kristof T., Tkatchenko, Alexandre, Müller, Klaus-Robert
In recent years, the use of Machine Learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules
Gebauer, Niklas W. A., Gastegger, Michael, Schütt, Kristof T.
Deep learning has proven to yield fast and accurate predictions of quantum-chemical properties to accelerate the discovery of novel molecules and materials. As an exhaustive exploration of the vast chemical space is still infeasible, we require generative models that guide our search towards systems with desired properties. While graph-based models have previously been proposed, they are restricted by a lack of spatial information such that they are unable to recognize spatial isomerism and non-bonded interactions. Here, we introduce a generative neural network for 3d point sets that respects the rotational invariance of the targeted structures. We apply it to the generation of molecules and demonstrate its ability to approximate the distribution of equilibrium structures using spatial metrics as well as established measures from chemoinformatics. As our model is able to capture the complex relationship between 3d geometry and electronic properties, we bias the distribution of the generator towards molecules with a small HOMO-LUMO gap - an important property for the design of organic solar cells.
Molecular Dynamics with Neural-Network Potentials
Gastegger, Michael, Marquetand, Philipp
Molecular dynamics simulations are an important tool for describing the evolution of a chemical system with time. However, these simulations are inherently held back either by the prohibitive cost of accurate electronic structure theory computations or the limited accuracy of classical empirical force fields. Machine learning techniques can help to overcome these limitations by providing access to potential energies, forces and other molecular properties modeled directly after an electronic structure reference at only a fraction of the original computational cost. The present text discusses several practical aspects of conducting machine learning driven molecular dynamics simulations. First, we study the efficient selection of reference data points on the basis of an active learning inspired adaptive sampling scheme. This is followed by the analysis of a machine-learning based model for simulating molecular dipole moments in the framework of predicting infrared spectra via molecular dynamics simulations. Finally, we show that machine learning models can offer valuable aid in understanding chemical systems beyond a simple prediction of quantities.
Machine learning enables long time scale molecular photodynamics simulations
Westermayr, Julia, Gastegger, Michael, Menger, Maximilian F. S. J., Mai, Sebastian, González, Leticia, Marquetand, Philipp
Abstract: Photo-inducedprocesses are fundamental in nature, but accurate simulations are seriously limited by the cost of the underlying quantum chemical calculations, hampering their application for long time scales. Here we introduce a method based on machine learning to overcome this bottleneck and enable accurate photodynamics on nanosecond time scales, which are otherwise out of reach with contemporary approaches. Instead of expensive quantum chemistry during molecular dynamics simulations, we use deep neural networks to learn the relationship between a molecular geometry and its high-dimensional electronic properties. As an example, the time evolution of the methylenimmonium cation for one nanosecond is used to demonstrate that machine learning algorithms can outperform standard excited-state molecular dynamics approaches in their computational efficiency while delivering the same accuracy. Introduction Machine learning (ML) is revolutionizing the most diverse domains, like image recognition [1], playing board games [2], or society integration of refugees [3]. Also in chemistry, anincreasing range of applications is being tackled with ML, for example, the design and discovery of new molecules and materials [4, 5, 6]. In the present study, we show how ML enables efficient photodynamics simulations. Photodynamics is the study of photo-induced processes that occur after a molecule is exposed to light. Photosynthesis or DNA photodamage leading to skin cancer are only two examples of phenomena that involve molecules interacting with light [7, 8, 9, 10, 11]. The simulation of such processes has been key to learn structure-dynamicsfunction relationshipsthat can be used to guide the design of photonic materials, such as photosensitive drugs [12], photocatalysts [4] and photovoltaics [13, 14].