Goto

Collaborating Authors

 adsorbate


Towards Fully Automated Molecular Simulations: Multi-Agent Framework for Simulation Setup and Force Field Extraction

Petković, Marko, Menkovski, Vlado, Calero, Sofía

arXiv.org Artificial Intelligence

Automated characterization of porous materials has the potential to accelerate materials discovery, but it remains limited by the complexity of simulation setup and force field selection. We propose a multi-agent framework in which LLM-based agents can autonomously understand a characterization task, plan appropriate simulations, assemble relevant force fields, execute them and interpret their results to guide subsequent steps. As a first step toward this vision, we present a multi-agent system for literature-informed force field extraction and automated RASPA simulation setup. Initial evaluations demonstrate high correctness and reproducibility, highlighting this approach's potential to enable fully autonomous, scalable materials characterization.


Transition States Energies from Machine Learning: An Application to Reverse Water-Gas Shift on Single-Atom Alloys

Cheula, Raffaele, Andersen, Mie

arXiv.org Artificial Intelligence

Obtaining accurate transition state (TS) energies is a bottleneck in computational screening of complex materials and reaction networks due to the high cost of TS search methods and first-principles methods such as density functional theory (DFT). Here we propose a machine learning (ML) model for predicting TS energies based on Gaussian process regression with the Wasserstein Weisfeiler-Lehman graph kernel (WWL-GPR). Applying the model to predict adsorption and TS energies for the reverse water-gas shift (RWGS) reaction on single-atom alloy (SAA) catalysts, we show that it can significantly improve the accuracy compared to traditional approaches based on scaling relations or ML models without a graph representation. Further benefitting from the low cost of model training, we train an ensemble of WWL-GPR models to obtain uncertainties through subsampling of the training data and show how these uncertainties propagate to turnover frequency (TOF) predictions through the construction of an ensemble of microkinetic models. Comparing the errors in model-based vs DFT-based TOF predictions, we show that the WWL-GPR model reduces errors by almost an order of magnitude compared to scaling relations. This demonstrates the critical impact of accurate energy predictions on catalytic activity estimation. Finally, we apply our model to screen new materials, identifying promising catalysts for RWGS. This work highlights the power of combining advanced ML techniques with DFT and microkinetic modeling for screening catalysts for complex reactions like RWGS, providing a robust framework for future catalyst design.


Adsorb-Agent: Autonomous Identification of Stable Adsorption Configurations via Large Language Model Agent

Ock, Janghoon, Vinchurkar, Tirtha, Jadhav, Yayati, Farimani, Amir Barati

arXiv.org Artificial Intelligence

Adsorption energy is a key reactivity descriptor in catalysis, enabling efficient screening for optimal catalysts. However, determining adsorption energy typically requires evaluating numerous adsorbate-catalyst configurations. Current algorithmic approaches rely on exhaustive enumeration of adsorption sites and configurations, which makes the process computationally intensive and does not inherently guarantee the identification of the global minimum energy. In this work, we introduce Adsorb-Agent, a Large Language Model (LLM) agent designed to efficiently identify system-specific stable adsorption configurations corresponding to the global minimum adsorption energy. Adsorb-Agent leverages its built-in knowledge and emergent reasoning capabilities to strategically explore adsorption configurations likely to hold adsorption energy. By reducing the reliance on exhaustive sampling, it significantly decreases the number of initial configurations required while improving the accuracy of adsorption energy predictions. We evaluate Adsorb-Agent's performance across twenty representative systems encompassing a range of complexities. The Adsorb-Agent successfully identifies comparable adsorption energies for 83.7% of the systems and achieves lower energies, closer to the actual global minimum, for 35% of the systems, while requiring significantly fewer initial configurations than conventional methods. Its capability is particularly evident in complex systems, where it identifies lower adsorption energies for 46.7% of systems involving intermetallic surfaces and 66.7% of systems with large adsorbate molecules. These results demonstrate the potential of Adsorb-Agent to accelerate catalyst discovery by reducing computational costs and improving the reliability of adsorption energy predictions.


AdsorbRL: Deep Multi-Objective Reinforcement Learning for Inverse Catalysts Design

Lacombe, Romain, Hendren, Lucas, El-Awady, Khalid

arXiv.org Artificial Intelligence

A central challenge of the clean energy transition is the development of catalysts for low-emissions technologies. Recent advances in Machine Learning for quantum chemistry drastically accelerate the computation of catalytic activity descriptors such as adsorption energies. Here we introduce AdsorbRL, a Deep Reinforcement Learning agent aiming to identify potential catalysts given a multi-objective binding energy target, trained using offline learning on the Open Catalyst 2020 and Materials Project data sets. We experiment with Deep Q-Network agents to traverse the space of all ~160,000 possible unary, binary and ternary compounds of 55 chemical elements, with very sparse rewards based on adsorption energy known for only between 2,000 and 3,000 catalysts per adsorbate. To constrain the actions space, we introduce Random Edge Traversal and train a single-objective DQN agent on the known states subgraph, which we find strengthens target binding energy by an average of 4.1 eV. We extend this approach to multi-objective, goal-conditioned learning, and train a DQN agent to identify materials with the highest (respectively lowest) adsorption energies for multiple simultaneous target adsorbates. We experiment with Objective Sub-Sampling, a novel training scheme aimed at encouraging exploration in the multi-objective setup, and demonstrate simultaneous adsorption energy improvement across all target adsorbates, by an average of 0.8 eV. Overall, our results suggest strong potential for Deep Reinforcement Learning applied to the inverse catalysts design problem.


The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture

Sriram, Anuroop, Choi, Sihoon, Yu, Xiaohan, Brabson, Logan M., Das, Abhishek, Ulissi, Zachary, Uyttendaele, Matt, Medford, Andrew J., Sholl, David S.

arXiv.org Artificial Intelligence

New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature. We explore a computational approach benefiting from recent innovations in machine learning (ML) and present a dataset named Open DAC 2023 (ODAC23) consisting of more than 38M density functional theory (DFT) calculations on more than 8,400 MOF materials containing adsorbed $CO_2$ and/or $H_2O$. ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available. In addition to probing properties of adsorbed molecules, the dataset is a rich source of information on structural relaxation of MOFs, which will be useful in many contexts beyond specific applications for DAC. A large number of MOFs with promising properties for DAC are identified directly in ODAC23. We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level. This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications, including DAC.


On the importance of catalyst-adsorbate 3D interactions for relaxed energy predictions

Carbonero, Alvaro, Duval, Alexandre, Schmidt, Victor, Miret, Santiago, Hernandez-Garcia, Alex, Bengio, Yoshua, Rolnick, David

arXiv.org Artificial Intelligence

The use of machine learning for material property prediction and discovery has traditionally centered on graph neural networks that incorporate the geometric configuration of all atoms. However, in practice not all this information may be readily available, e.g. when evaluating the potentially unknown binding of adsorbates to catalyst. In this paper, we investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate with respect to the electro-catalyst. We consider SchNet, DimeNet++ and FAENet as base architectures and measure the impact of four modifications on model performance: removing edges in the input graph, pooling independent representations, not sharing the backbone weights and using an attention mechanism to propagate non-geometric relative information. We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE. Our work suggests future research directions in accelerated materials discovery where information on reactant configurations can be reduced or altogether omitted.


AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials

Lan, Janice, Palizhati, Aini, Shuaibi, Muhammed, Wood, Brandon M., Wander, Brook, Das, Abhishek, Uyttendaele, Matt, Zitnick, C. Lawrence, Ulissi, Zachary W.

arXiv.org Artificial Intelligence

Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the adsorption energy for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and researcher intuition. As the desire to perform high-throughput screening increases, it becomes challenging to use heuristics and intuition alone. In this paper, we demonstrate machine learning potentials can be leveraged to identify low energy adsorbate-surface configurations more accurately and efficiently. Our algorithm provides a spectrum of trade-offs between accuracy and efficiency, with one balanced option finding the lowest energy configuration 87.36% of the time, while achieving a 2000x speedup in computation. To standardize benchmarking, we introduce the Open Catalyst Dense dataset containing nearly 1,000 diverse surfaces and 100,000 unique configurations.


Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning

Zheng, Shuxin, He, Jiyan, Liu, Chang, Shi, Yu, Lu, Ziheng, Feng, Weitao, Ju, Fusong, Wang, Jiaxi, Zhu, Jianwei, Min, Yaosen, Zhang, He, Tang, Shidi, Hao, Hongxia, Jin, Peiran, Chen, Chi, Noé, Frank, Liu, Haiguang, Liu, Tie-Yan

arXiv.org Artificial Intelligence

Advances in deep learning have greatly improved structure prediction of molecules. However, many macroscopic observations that are important for real-world applications are not functions of a single molecular structure, but rather determined from the equilibrium distribution of structures. Traditional methods for obtaining these distributions, such as molecular dynamics simulation, are computationally expensive and often intractable. In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems. Inspired by the annealing process in thermodynamics, DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system, such as a chemical graph or a protein sequence. This framework enables efficient generation of diverse conformations and provides estimations of state densities. We demonstrate the performance of DiG on several molecular tasks, including protein conformation sampling, ligand structure sampling, catalyst-adsorbate sampling, and property-guided structure generation. DiG presents a significant advancement in methodology for statistically understanding molecular systems, opening up new research opportunities in molecular science.


The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts

Tran, Richard, Lan, Janice, Shuaibi, Muhammed, Wood, Brandon M., Goyal, Siddharth, Das, Abhishek, Heras-Domingo, Javier, Kolluru, Adeesh, Rizvi, Ammar, Shoghi, Nima, Sriram, Anuroop, Therrien, Felix, Abed, Jehad, Voznyy, Oleksandr, Sargent, Edward H., Ulissi, Zachary, Zitnick, C. Lawrence

arXiv.org Artificial Intelligence

The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single point calculations) across a range of oxide materials, coverages, and adsorbates. We define generalized total energy tasks that enable property prediction beyond adsorption energies; we test baseline performance of several graph neural networks; and we provide pre-defined dataset splits to establish clear benchmarks for future efforts. In the most general task, GemNet-OC sees a ~36% improvement in energy predictions when combining the chemically dissimilar OC20 and OC22 datasets via fine-tuning. Similarly, we achieved a ~19% improvement in total energy predictions on OC20 and a ~9% improvement in force predictions in OC22 when using joint training. We demonstrate the practical utility of a top performing model by capturing literature adsorption energies and important OER scaling relationships. We expect OC22 to provide an important benchmark for models seeking to incorporate intricate long-range electrostatic and magnetic interactions in oxide surfaces. Dataset and baseline models are open sourced, and a public leaderboard is available to encourage continued community developments on the total energy tasks and data.


Molecular Geometry-aware Transformer for accurate 3D Atomic System modeling

Yuan, Zheng, Zhang, Yaoyun, Tan, Chuanqi, Wang, Wei, Huang, Fei, Huang, Songfang

arXiv.org Artificial Intelligence

Molecular dynamic simulations are important in computational physics, chemistry, material, and biology. Machine learning-based methods have shown strong abilities in predicting molecular energy and properties and are much faster than DFT calculations. Molecular energy is at least related to atoms, bonds, bond angles, torsion angles, and nonbonding atom pairs. Previous Transformer models only use atoms as inputs which lack explicit modeling of the aforementioned factors. To alleviate this limitation, we propose Moleformer, a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them using rotational and translational invariant geometry-aware spatial encoding. Proposed spatial encoding calculates relative position information including distances and angles among nodes and edges. We benchmark Moleformer on OC20 and QM9 datasets, and our model achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties compared to other Transformer and Graph Neural Network (GNN) methods which proves the effectiveness of the proposed geometry-aware spatial encoding in Moleformer.