tkatchenko
- Europe > Germany > Berlin (0.04)
- North America > Canada (0.04)
Delta-learned force fields for nonbonded interactions: Addressing the strength mismatch between covalent-nonbonded interaction for global models
Cázares-Trejo, Leonardo, Loreto-Silva, Marco, Sauceda, Huziel E.
Noncovalent interactions--vdW dispersion, hydrogen/halogen bonding, ion-$π$, and $π$-stacking--govern structure, dynamics, and emergent phenomena in materials and molecular systems, yet accurately learning them alongside covalent forces remains a core challenge for machine-learned force fields (MLFFs). This challenge is acute for global models that use Coulomb-matrix (CM) descriptors compared under Euclidean/Frobenius metrics in multifragment settings. We show that the mismatch between predominantly covalent force labels and the CM's overrepresentation of intermolecular features biases single-model training and degrades force-field fidelity. To address this, we introduce \textit{$Δ$-sGDML}, a scale-aware formulation within the sGDML framework that explicitly decouples intra- and intermolecular physics by training fragment-specific models alongside a dedicated binding model, then composing them at inference. Across benzene dimers, host-guest complexes (C$_{60}$@buckycatcher, NO$_3^-$@i-corona[6]arene), benzene-water, and benzene-Na$^+$, \mbox{$Δ$-sGDML} delivers consistent gains over a single global model, with fragment-resolved force-error reductions up to \textbf{75\%}, without loss of energy accuracy. Furthermore, molecular-dynamics simulations further confirm that the $Δ$-model yields a reliable force field for C$_{60}$@buckycatcher, producing stable trajectories across a wide range of temperatures (10-400~K), unlike the single global model, which loses stability above $\sim$200~K. The method offers a practical route to homogenize per-fragment errors and recover reliable noncovalent physics in global MLFFs.
- North America > Mexico (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Berlin (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Machine learning surrogate models of many-body dispersion interactions in polymer melts
Shen, Zhaoxiang, Sosa, Raúl I., Lengiewicz, Jakub, Tkatchenko, Alexandre, Bordas, Stéphane P. A.
Accurate prediction of many-body dispersion (MBD) interactions is essential for understanding the van der Waals forces that govern the behavior of many complex molecular systems. However, the high computational cost of MBD calculations limits their direct application in large-scale simulations. In this work, we introduce a machine learning surrogate model specifically designed to predict MBD forces in polymer melts, a system that demands accurate MBD description and offers structural advantages for machine learning approaches. Our model is based on a trimmed SchNet architecture that selectively retains the most relevant atomic connections and incorporates trainable radial basis functions for geometric encoding. We validate our surrogate model on datasets from polyethylene, polypropylene, and polyvinyl chloride melts, demonstrating high predictive accuracy and robust generalization across diverse polymer systems. In addition, the model captures key physical features, such as the characteristic decay behavior of MBD interactions, providing valuable insights for optimizing cutoff strategies. Characterized by high computational efficiency, our surrogate model enables practical incorporation of MBD effects into large-scale molecular simulations.
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Europe > Luxembourg > Luxembourg Canton > Luxembourg City (0.04)
Machine Learned Force Fields: Fundamentals, its reach, and challenges
Vital, Carlos A., Armenta-Rico, Román J., Sauceda, Huziel E.
Highly accurate force fields are a mandatory requirement to generate predictive simulations. In this regard, Machine Learning Force Fields (MLFFs) have emerged as a revolutionary approach in computational chemistry and materials science, combining the accuracy of quantum mechanical methods with computational efficiency orders of magnitude superior to ab-initio methods. This chapter provides an introduction of the fundamentals of learning and how it is applied to construct MLFFs, detailing key methodologies such as neural network potentials and kernel-based models. Emphasis is placed on the construction of SchNet model, as one of the most elemental neural network-based force fields that are nowadays the basis of modern architectures. Additionally, the GDML framework is described in detail as an example of how the elegant formulation of kernel methods can be used to construct mathematically robust and physics-inspired MLFFs. The ongoing advancements in MLFF development continue to expand their applicability, enabling precise simulations of large and complex systems that were previously beyond reach. This chapter concludes by highlighting the transformative impact of MLFFs on scientific research, underscoring their role in driving future discoveries in the fields of chemistry, physics, and materials science.
- Research Report (1.00)
- Overview (1.00)
- Energy (1.00)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)
Pretraining Graph Transformers with Atom-in-a-Molecule Quantum Properties for Improved ADMET Modeling
Fallani, Alessio, Nugmanov, Ramil, Arjona-Medina, Jose, Wegner, Jörg Kurt, Tkatchenko, Alexandre, Chernichenko, Kostiantyn
We evaluate the impact of pretraining Graph Transformer architectures on atom-level quantum-mechanical features for the modeling of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of drug-like compounds. We compare this pretraining strategy with two others: one based on molecular quantum properties (specifically the HOMO-LUMO gap) and one using a self-supervised atom masking technique. After fine-tuning on Therapeutic Data Commons ADMET datasets, we evaluate the performance improvement in the different models observing that models pretrained with atomic quantum mechanical properties produce in general better results. We then analyse the latent representations and observe that the supervised strategies preserve the pretraining information after finetuning and that different pretrainings produce different trends in latent expressivity across layers. Furthermore, we find that models pretrained on atomic quantum mechanical properties capture more low-frequency laplacian eigenmodes of the input graph via the attention weights and produce better representations of atomic environments within the molecule. Application of the analysis to a much larger non-public dataset for microsomal clearance illustrates generalizability of the studied indicators. In this case the performances of the models are in accordance with the representation analysis and highlight, especially for the case of masking pretraining and atom-level quantum property pretraining, how model types with similar performance on public benchmarks can have different performances on large scale pharmaceutical data.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Luxembourg > Luxembourg Canton > Luxembourg City (0.04)
- Europe > Belgium (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Materials (0.86)
Discovery of structure-property relations for molecules via hypothesis-driven active learning over the chemical space
Ghosh, Ayana, Kalinin, Sergei V., Ziatdinov, Maxim A.
Discovery of the molecular candidates for applications in drug targets, biomolecular systems, catalysts, photovoltaics, organic electronics, and batteries, necessitates development of machine learning algorithms capable of rapid exploration of the chemical spaces targeting the desired functionalities. Here we introduce a novel approach for the active learning over the chemical spaces based on hypothesis learning. We construct the hypotheses on the possible relationships between structures and functionalities of interest based on a small subset of data and introduce them as (probabilistic) mean functions for the Gaussian process. This approach combines the elements from the symbolic regression methods such as SISSO and active learning into a single framework. The primary focus of constructing this framework is to approximate physical laws in an active learning regime toward a more robust predictive performance, as traditional evaluation on hold-out sets in machine learning doesn't account for out-of-distribution effects and may lead to a complete failure on unseen chemical space. Here, we demonstrate it for the QM9 dataset, but it can be applied more broadly to datasets from both domains of molecular and solid-state materials sciences.
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
- North America > United States > Tennessee > Knox County > Knoxville (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Energy (1.00)
- Government > Regional Government > North America Government > United States Government (0.69)
Stress and heat flux via automatic differentiation
Langer, Marcel F., Frank, J. Thorben, Knoop, Florian
Machine-learning potentials provide computationally efficient and accurate approximations of the Born-Oppenheimer potential energy surface. This potential determines many materials properties and simulation techniques usually require its gradients, in particular forces and stress for molecular dynamics, and heat flux for thermal transport properties. Recently developed potentials feature high body order and can include equivariant semi-local interactions through message-passing mechanisms. Due to their complex functional forms, they rely on automatic differentiation (AD), overcoming the need for manual implementations or finite-difference schemes to evaluate gradients. This study demonstrates a unified AD approach to obtain forces, stress, and heat flux for such potentials, and provides a model-independent implementation. The method is tested on the Lennard-Jones potential, and then applied to predict cohesive properties and thermal conductivity of tin selenide using an equivariant message-passing neural network potential.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Germany > Berlin (0.05)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- (5 more...)
Automatic Identification of Chemical Moieties
Lederer, Jonas, Gastegger, Michael, Schütt, Kristof T., Kampffmeyer, Michael, Müller, Klaus-Robert, Unke, Oliver T.
In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Norway > Northern Norway > Troms > Tromsø (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- (2 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.92)
- Education (0.92)
Reconstructing Kernel-based Machine Learning Force Fields with Super-linear Convergence
Blücher, Stefan, Müller, Klaus-Robert, Chmiela, Stefan
Kernel machines have sustained continuous progress in the field of quantum chemistry. In particular, they have proven to be successful in the low-data regime of force field reconstruction. This is because many equivariances and invariances due to physical symmetries can be incorporated into the kernel function to compensate for much larger datasets. So far, the scalability of kernel machines has however been hindered by its quadratic memory and cubical runtime complexity in the number of training points. While it is known, that iterative Krylov subspace solvers can overcome these burdens, their convergence crucially relies on effective preconditioners, which are elusive in practice. Effective preconditioners need to partially pre-solve the learning problem in a computationally cheap and numerically robust manner. Here, we consider the broad class of Nystr\"om-type methods to construct preconditioners based on successively more sophisticated low-rank approximations of the original kernel matrix, each of which provides a different set of computational trade-offs. All considered methods aim to identify a representative subset of inducing (kernel) columns to approximate the dominant kernel spectrum.