Harb, Hassan
MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow
Yan, Xiaoli, Hudson, Nathaniel, Park, Hyun, Grzenda, Daniel, Pauloski, J. Gregory, Schwarting, Marcus, Pan, Haochen, Harb, Hassan, Foreman, Samuel, Knight, Chris, Gibbs, Tom, Chard, Kyle, Chaudhuri, Santanu, Tajkhorshid, Emad, Foster, Ian, Moosavi, Mohamad, Ward, Logan, Huerta, E. A.
We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screening and filtering AI-generated MOFs using molecular dynamics, density functional theory, and Monte Carlo simulations. These heterogeneous tasks are unified within an online learning framework that optimizes the utilization of available CPU and GPU resources across HPC systems. Performance metrics from a 450-node (14,400 AMD Zen 3 CPUs + 1800 NVIDIA A100 GPUs) supercomputer run demonstrate that MOFA achieves high-throughput generation of novel MOF structures, with CO$_2$ adsorption capacities ranking among the top 10 in the hypothetical MOF (hMOF) dataset. Furthermore, the production of high-quality MOFs exhibits a linear relationship with the number of nodes utilized. The modular architecture of MOFA will facilitate its integration into other scientific applications that dynamically combine GenAI with large-scale simulations.
Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry
Zimmermann, Yoel, Bazgir, Adib, Afzal, Zartashia, Agbere, Fariha, Ai, Qianxiang, Alampara, Nawaf, Al-Feghali, Alexander, Ansari, Mehrad, Antypov, Dmytro, Aswad, Amro, Bai, Jiaru, Baibakova, Viktoriia, Biswajeet, Devi Dutta, Bitzek, Erik, Bocarsly, Joshua D., Borisova, Anna, Bran, Andres M, Brinson, L. Catherine, Calderon, Marcel Moran, Canalicchio, Alessandro, Chen, Victor, Chiang, Yuan, Circi, Defne, Charmes, Benjamin, Chaudhary, Vikrant, Chen, Zizhang, Chiu, Min-Hsueh, Clymo, Judith, Dabhadkar, Kedar, Daelman, Nathan, Datar, Archit, de Jong, Wibe A., Evans, Matthew L., Fard, Maryam Ghazizade, Fisicaro, Giuseppe, Gangan, Abhijeet Sadashiv, George, Janine, Gonzalez, Jose D. Cojal, Götte, Michael, Gupta, Ankur K., Harb, Hassan, Hong, Pengyu, Ibrahim, Abdelrahman, Ilyas, Ahmed, Imran, Alishba, Ishimwe, Kevin, Issa, Ramsey, Jablonka, Kevin Maik, Jones, Colin, Josephson, Tyler R., Juhasz, Greg, Kapoor, Sarthak, Kang, Rongda, Khalighinejad, Ghazal, Khan, Sartaaj, Klawohn, Sascha, Kuman, Suneel, Ladines, Alvin Noe, Leang, Sarom, Lederbauer, Magdalena, Sheng-Lun, null, Liao, null, Liu, Hao, Liu, Xuefeng, Lo, Stanley, Madireddy, Sandeep, Maharana, Piyush Ranjan, Maheshwari, Shagun, Mahjoubi, Soroush, Márquez, José A., Mills, Rob, Mohanty, Trupti, Mohr, Bernadette, Moosavi, Seyed Mohamad, Moßhammer, Alexander, Naghdi, Amirhossein D., Naik, Aakash, Narykov, Oleksandr, Näsström, Hampus, Nguyen, Xuan Vu, Ni, Xinyi, O'Connor, Dana, Olayiwola, Teslim, Ottomano, Federico, Ozhan, Aleyna Beste, Pagel, Sebastian, Parida, Chiku, Park, Jaehee, Patel, Vraj, Patyukova, Elena, Petersen, Martin Hoffmann, Pinto, Luis, Pizarro, José M., Plessers, Dieter, Pradhan, Tapashree, Pratiush, Utkarsh, Puli, Charishma, Qin, Andrew, Rajabi, Mahyar, Ricci, Francesco, Risch, Elliot, Ríos-García, Martiño, Roy, Aritra, Rug, Tehseen, Sayeed, Hasan M, Scheidgen, Markus, Schilling-Wilhelmi, Mara, Schloz, Marcel, Schöppach, Fabian, Schumann, Julia, Schwaller, Philippe, Schwarting, Marcus, Sharlin, Samiha, Shen, Kevin, Shi, Jiale, Si, Pradip, D'Souza, Jennifer, Sparks, Taylor, Sudhakar, Suraj, Talirz, Leopold, Tang, Dandan, Taran, Olga, Terboven, Carla, Tropin, Mark, Tsymbal, Anastasiia, Ueltzen, Katharina, Unzueta, Pablo Andres, Vasan, Archit, Vinchurkar, Tirtha, Vo, Trung, Vogel, Gabriel, Völker, Christoph, Weinreich, Jan, Yang, Faradawn, Zaki, Mohd, Zhang, Chi, Zhang, Sylvester, Zhang, Weijie, Zhu, Ruijie, Zhu, Shang, Janssen, Jan, Li, Calvin, Foster, Ian, Blaiszik, Ben
Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.