Raja, Sanjeev
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
Amin, Ishan, Raja, Sanjeev, Krishnapriyan, Aditi
The foundation model (FM) paradigm is transforming Machine Learning Force Fields (MLFFs), leveraging general-purpose representations and scalable training to perform a variety of computational chemistry tasks. Although MLFF FMs have begun to close the accuracy gap relative to first-principles methods, there is still a strong need for faster inference speed. Additionally, while research is increasingly focused on general-purpose models which transfer across chemical space, practitioners typically only study a small subset of systems at a given time. This underscores the need for fast, specialized MLFFs relevant to specific downstream applications, which preserve test-time physical soundness while maintaining train-time scalability. In this work, we introduce a method for transferring general-purpose representations from MLFF foundation models to smaller, faster MLFFs specialized to specific regions of chemical space. We formulate our approach as a knowledge distillation procedure, where the smaller "student" MLFF is trained to match the Hessians of the energy predictions of the "teacher" foundation model. Our specialized MLFFs can be up to 20 $\times$ faster than the original foundation model, while retaining, and in some cases exceeding, its performance and that of undistilled models. We also show that distilling from a teacher model with a direct force parameterization into a student model trained with conservative forces (i.e., computed as derivatives of the potential energy) successfully leverages the representations from the large-scale teacher for improved accuracy, while maintaining energy conservation during test-time molecular dynamics simulations. More broadly, our work suggests a new paradigm for MLFF development, in which foundation models are released along with smaller, specialized simulation "engines" for common chemical subsets.
Stability-Aware Training of Neural Network Interatomic Potentials with Differentiable Boltzmann Estimators
Raja, Sanjeev, Amin, Ishan, Pedregosa, Fabian, Krishnapriyan, Aditi S.
Neural network interatomic potentials (NNIPs) are an attractive alternative to ab-initio methods for molecular dynamics (MD) simulations. However, they can produce unstable simulations which sample unphysical states, limiting their usefulness for modeling phenomena occurring over longer timescales. To address these challenges, we present Stability-Aware Boltzmann Estimator (StABlE) Training, a multimodal training procedure which combines conventional supervised training from quantum-mechanical energies and forces with reference system observables, to produce stable and accurate NNIPs. StABlE Training iteratively runs MD simulations to seek out unstable regions, and corrects the instabilities via supervision with a reference observable. The training procedure is enabled by the Boltzmann Estimator, which allows efficient computation of gradients required to train neural networks to system observables, and can detect both global and local instabilities. We demonstrate our methodology across organic molecules, tetrapeptides, and condensed phase systems, along with using three modern NNIP architectures. In all three cases, StABlE-trained models achieve significant improvements in simulation stability and recovery of structural and dynamic observables. In some cases, StABlE-trained models outperform conventional models trained on datasets 50 times larger. As a general framework applicable across NNIP architectures and systems, StABlE Training is a powerful tool for training stable and accurate NNIPs, particularly in the absence of large reference datasets. Molecular dynamics (MD) simulation is a staple method of computational science, enabling high-resolution spatiotemporal modeling of atomistic systems throughout biology, chemistry, and materials science [21]. Under the Born-Oppenheimer approximation, system evolution is governed by the underlying potential energy surface (PES), which is a function of the nuclear Cartesian coordinates [11]. While the atomic forces needed for MD simulation can be obtained on-the-fly via ab-initio quantum-mechanical (QM) calculations [12], the unfavorable scaling of this approach makes it prohibitively expensive for realistic system sizes and timescales [22]. There is a long history of using machine learning (ML) approaches in place of ab-initio methods to efficiently approximate the global PES [7, 6, 2, 55]. NNIPs, typically parameterized as graph neural networks [56, 33], are trained by matching energy and forces of a molecule or material from a reference dataset of QM calculations, such as Density Functional Theory (DFT) [31]. NNIPs trained on large ab-initio datasets are increasingly being used to model challenging and important chemical systems with favorable results [45, 37, 15, 57, 64, 43, 14, 3, 36, 60, 26, 19].
Multi-Stage Fault Warning for Large Electric Grids Using Anomaly Detection and Machine Learning
Raja, Sanjeev, Fokoué, Ernest
In the monitoring of a complex electric grid, it is of paramount importance to provide operators with early warnings of anomalies detected on the network, along with a precise classification and diagnosis of the specific fault type. In this paper, we propose a novel multi-stage early warning system prototype for electric grid fault detection, classification, subgroup discovery, and visualization. In the first stage, a computationally efficient anomaly detection method based on quartiles detects the presence of a fault in real time. In the second stage, the fault is classified into one of nine pre-defined disaster scenarios. The time series data are first mapped to highly discriminative features by applying dimensionality reduction based on temporal autocorrelation. The features are then mapped through one of three classification techniques: support vector machine, random forest, and artificial neural network. Finally in the third stage, intra-class clustering based on dynamic time warping is used to characterize the fault with further granularity. Results on the Bonneville Power Administration electric grid data show that i) the proposed anomaly detector is both fast and accurate; ii) dimensionality reduction leads to dramatic improvement in classification accuracy and speed; iii) the random forest method offers the most accurate, consistent, and robust fault classification; and iv) time series within a given class naturally separate into five distinct clusters which correspond closely to the geographical distribution of electric grid buses.