Goto

Collaborating Authors

 Chaudhuri, Santanu


MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow

arXiv.org Artificial Intelligence

We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screening and filtering AI-generated MOFs using molecular dynamics, density functional theory, and Monte Carlo simulations. These heterogeneous tasks are unified within an online learning framework that optimizes the utilization of available CPU and GPU resources across HPC systems. Performance metrics from a 450-node (14,400 AMD Zen 3 CPUs + 1800 NVIDIA A100 GPUs) supercomputer run demonstrate that MOFA achieves high-throughput generation of novel MOF structures, with CO$_2$ adsorption capacities ranking among the top 10 in the hypothetical MOF (hMOF) dataset. Furthermore, the production of high-quality MOFs exhibits a linear relationship with the number of nodes utilized. The modular architecture of MOFA will facilitate its integration into other scientific applications that dynamically combine GenAI with large-scale simulations.


End-to-end AI framework for interpretable prediction of molecular and crystal properties

arXiv.org Artificial Intelligence

We introduce an end-to-end computational framework that allows for hyperparameter optimization using the DeepHyper library, accelerated model training, and interpretable AI inference. The framework is based on state-of-the-art AI models including CGCNN, PhysNet, SchNet, MPNN, MPNN-transformer, and TorchMD-NET. We employ these AI models along with the benchmark QM9, hMOF, and MD17 datasets to showcase how the models can predict user-specified material properties within modern computing environments. We demonstrate transferable applications in the modeling of small molecules, inorganic crystals and nanoporous metal organic frameworks with a unified, standalone framework. We have deployed and tested this framework in the ThetaGPU supercomputer at the Argonne Leadership Computing Facility, and in the Delta supercomputer at the National Center for Supercomputing Applications to provide researchers with modern tools to conduct accelerated AI-driven discovery in leadership-class computing environments. We release these digital assets as open source scientific software in GitLab, and ready-to-use Jupyter notebooks in Google Colab.


GHP-MOFassemble: Diffusion modeling, high throughput screening, and molecular dynamics for rational discovery of novel metal-organic frameworks for carbon capture at scale

arXiv.org Artificial Intelligence

We introduce GHP-MOFassemble, a Generative artificial intelligence (AI), High Performance framework to accelerate the rational design of metal-organic frameworks (MOFs) with high CO2 capacity and synthesizable linkers. Our framework combines a diffusion model, a class of generative AI, to generate novel linkers that are assembled with one of three pre-selected nodes into MOFs in a primitive cubic (pcu) topology. The CO2 capacities of these AI-generated MOFs are predicted using a modified version of the crystal graph convolutional neural network model. We then use the LAMMPS code to perform molecular dynamics simulations to relax the AI-generated MOF structures, and identify those that converge to stable structures, and maintain their porous properties throughout the simulations. Among 120,000 pcu MOF candidates generated by the GHP-MOFassemble framework, with three distinct metal nodes (Cu paddlewheel, Zn paddlewheel, Zn tetramer), a total of 102 structures completed molecular dynamics simulations at 1 bar with predicted CO2 capacity higher than 2 mmol/g at 0.1 bar, which corresponds to the top 5% of hMOFs in the hypothetical MOF (hMOF) dataset in the MOFX-DB database. Among these candidates, 18 have change in density lower than 1% during molecular dynamics simulations, indicating their stability. We also found that the top five GHP-MOFassemble's MOF structures have CO2 capacities higher than 96.9% of hMOF structures. This new approach combines generative AI, graph modeling, large-scale molecular dynamics simulations, and extreme scale computing to open up new pathways for the accelerated discovery of novel MOF structures at scale.


A framework for fully autonomous design of materials via multiobjective optimization and active learning: challenges and next steps

arXiv.org Artificial Intelligence

Data-driven automated laboratories, also called self-driving laboratories, can significantly accelerate molecular synthesis and materials discovery. A key technical challenge of fully autonomous and artificial intelligenceassisted laboratory design is to effectively collect and utilize data from multiple complex processes in order to inform future experimentation. One long-standing template for utilizing experimental and simulation data is the multiresponse surface methodology (RSM) [13], whereby initial data sets are gathered through design of experiments and then statistical models are built for each quantity of interest, analyzed, and hypothesis tested iteratively. In modern scientific and engineering settings, several common paradigms could be considered as specific implementations of this discovery framework, the most common of which are active learning and modelbased optimization techniques such as Bayesian optimization. To account for multiple competing criteria, we utilize an active learning framework based in multiobjective optimization, which utilizes surrogates (such as Gaussian processes), optimization solvers, and multicriteria data acquisition in a closed feedback loop.