Goto

Collaborating Authors

 Csanyi, Gabor


A practical guide to machine learning interatomic potentials -- Status and future

arXiv.org Artificial Intelligence

The rapid development and large body of literature on machine learning interatomic potentials (MLIPs) can make it difficult to know how to proceed for researchers who are not experts but wish to use these tools. The spirit of this review is to help such researchers by serving as a practical, accessible guide to the state-of-the-art in MLIPs. This review paper covers a broad range of topics related to MLIPs, including (i) central aspects of how and why MLIPs are enablers of many exciting advancements in molecular modeling, (ii) the main underpinnings of different types of MLIPs, including their basic structure and formalism, (iii) the potentially transformative impact of universal MLIPs for both organic and inorganic systems, including an overview of the most recent advances, capabilities, downsides, and potential applications of this nascent class of MLIPs, (iv) a practical guide for estimating and understanding the execution speed of MLIPs, including guidance for users based on hardware availability, type of MLIP used, and prospective simulation size and time, (v) a manual for what MLIP a user should choose for a given application by considering hardware resources, speed requirements, energy and force accuracy requirements, as well as guidance for choosing pre-trained potentials or fitting a new potential from scratch, (vi) discussion around MLIP infrastructure, including sources of training data, pre-trained potentials, and hardware resources for training, (vii) summary of some key limitations of present MLIPs and current approaches to mitigate such limitations, including methods of including long-range interactions, handling magnetic systems, and treatment of excited states, and finally (viii) we finish with some more speculative thoughts on what the future holds for the development and application of MLIPs over the next 3-10+ years.


Evaluation of the MACE Force Field Architecture: from Medicinal Chemistry to Materials Science

arXiv.org Machine Learning

The resulting model is not only accurate on an independent test Machine learning force fields are becoming part of the set but is also able to run long, stable molecular dynamics standard toolbox of computational chemists as demonstrated simulations without the need for any parameter by the increasing number of successful applications tuning or any further iterative training. In Section VI, leading to new scientific discoveries, in fields including VII and IX, we test MACE on condensed phase systems that of amorphous materials [1], high-pressure (carbon, disordered materials and liquid water), systems [2], phase diagrams [3] and reaction dynamics showing considerable improvements in accuracy compared of molecules [4]. These applications were enabled by a to the models that were previously tested on significant effort in developing a wide range of novel machine these datasets. In the case of water, we also show that learning force field architectures. Recently, many the resulting MACE model can accurately describe the of these were incorporated into a single, unifying design thermodynamic and kinetic properties using NVT and space [5, 6], which helped uncover the relationship NPT molecular dynamics simulations. Finally, in Section between seemingly dissimilar approaches such as X we evaluate the MACE architecture on the QM9 the descriptor based machine learning force fields [7-12] machine learning benchmark demonstrating that it also and graph neural network based models [13-17]. This improves on the state of the art for several targets that new understanding directly led to the MACE architecture are not force field fitting tasks.


Ranking the information content of distance measures

arXiv.org Machine Learning

Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning approaches, particularly when data are sparse. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This in turn allows finding the most informative distance measure out of a pool of candidates. The approach is applied to find the most relevant policy variables for controlling the Covid-19 epidemic and to find compact yet informative representations of atomic structures, but its potential applications are wide ranging in many branches of science.