mof
EGMOF: Efficient Generation of Metal-Organic Frameworks Using a Hybrid Diffusion-Transformer Architecture
Han, Seunghee, Kang, Yeonghun, Bae, Taeun, Bernales, Varinia, Aspuru-Guzik, Alan, Kim, Jihan
Designing materials with targeted properties remain s challenging due to the vastness of chemical space and the scarcity of propert y-labeled data. While r ecent advances in generative models offer a promising w ay for inverse design, most approaches require large datasets and must be retrained for every new target property. Here, we introduce the EGMOF ( Efficient Generation of MOFs), a hybrid diffusion-transformer framework that overcome s these limitations through a modular, descriptor - mediated workflow. EGMOF decomposes inverse design into two steps: (1) a one -dimensional diffusion model (Prop2Desc) that maps desired properties to chemically meaningful descriptors followed by (2) a transformer model (Desc2MOF) that generates structures from the se descriptors. This modular hybrid design enables minimal retraining and maintains high accuracy even under small-data conditions. On a hydrogen uptake dataset, EGMOF achieved over 95 % validity and 84% hit rate, representing significant improvements of up to 57 % in validity and 14% in hit rate compared to existing methods, while remaining effective with only 1,000 training samples . Moreover, our model successfully performed conditional generation across 29 diverse property datasets, including CoREMOF, QMOF, and text - mined experimental datasets, whereas previous models have not. This work presents a data - efficient, generalizable approach to the inverse design of diverse MOFs and highlights the potential of modular inverse design workflows for broader materials discovery.
- North America > Canada > Ontario > Toronto (0.17)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Workflow (1.00)
- Research Report > New Finding (0.46)
L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks
Cui, Jiyu, Wu, Fang, Zhao, Haokai, Feng, Minggao, Evangelopoulos, Xenophon, Cooper, Andrew I., Choi, Yejin
Large language models have demonstrated remarkable reasoning capabilities across diverse natural language tasks. However, comparable breakthroughs in scientific discovery are more limited, because understanding complex physical phenomena demands multifaceted representations far beyond language alone. A compelling example is the design of functional materials such as MOFs-critical for a range of impactful applications like carbon capture and hydrogen storage. Navigating their vast and intricate design space in language-based representations interpretable by LLMs is challenging due to the numerous possible three-dimensional atomic arrangements and strict reticular rules of coordination geometry and topology. Despite promising early results in LLM-assisted discovery for simpler materials systems, MOF design remains heavily reliant on tacit human expertise rarely codified in textual information alone. To overcome this barrier, we introduce L2M3OF, the first multimodal LLM for MOFs. L2M3OF integrates crystal representation learning with language understanding to process structural, textual, and knowledge modalities jointly. L2M3OF employs a pre-trained crystal encoder with a lightweight projection layer to compress structural information into a token space, enabling efficient alignment with language instructions. To facilitate training and evaluation, we curate a structure-property-knowledge database of crystalline materials and benchmark L2M3OF against state-of-the-art closed-source LLMs such as GPT-5, Gemini-2.5-Pro and DeepSeek-R1. Experiments show that L2M3OF outperforms leading text-based closed-source LLMs in property prediction and knowledge generation tasks, despite using far fewer parameters. These results highlight the importance of multimodal approaches for porous material understanding and establish L2M3OF as a foundation for next-generation AI systems in materials discovery.
- North America > United States (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Oceania > Australia > New South Wales (0.04)
- (3 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
- Energy > Renewable > Hydrogen (0.34)
Steering an Active Learning Workflow Towards Novel Materials Discovery via Queue Prioritization
Schwarting, Marcus, Ward, Logan, Hudson, Nathaniel, Yan, Xiaoli, Blaiszik, Ben, Chaudhuri, Santanu, Huerta, Eliu, Foster, Ian
Abstract--Generative AI poses both opportunities and risks for solving inverse design problems in the sciences. Generative tools provide the ability to expand and refine a search space autonomously, but do so at the cost of exploring low-quality regions until sufficiently fine tuned. Here, we propose a queue prioritization algorithm that combines generative modeling and active learning in the context of a distributed workflow for exploring complex design spaces. We find that incorporating an active learning model to prioritize top design candidates can prevent a generative AI workflow from expending resources on nonsensical candidates and halt potential generative model decay. For an existing generative AI workflow for discovering novel molecular structure candidates for carbon capture, our active learning approach significantly increases the number of high-quality candidates identified by the generative model. We find that, out of 1000 novel candidates, our workflow without active learning can generate an average of 281 high-performing candidates, while our proposed prioritization with active learning can generate an average 604 high-performing candidates. In recent years, Generative AI (GenAI) models have significantly changed how computational screening workflows are designed and executed.
- Energy (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.30)
Flexible MOF Generation with Torsion-Aware Flow Matching
Kim, Nayoung, Kim, Seongsu, Ahn, Sungsoo
Designing metal-organic frameworks (MOFs) with novel chemistries is a longstanding challenge due to their large combinatorial space and complex 3D arrangements of the building blocks. While recent deep generative models have enabled scalable MOF generation, they assume (1) a fixed set of building blocks and (2) known local 3D coordinates of building blocks. However, this limits their ability to (1) design novel MOFs and (2) generate the structure using novel building blocks. We propose a two-stage MOF generation framework that overcomes these limitations by modeling both chemical and geometric degrees of freedom. First, we train an SMILES-based autoregressive model to generate metal and organic building blocks, paired with a cheminformatics toolkit for 3D structure initialization. Second, we introduce a flow matching model that predicts translations, rotations, and torsional angles to assemble the blocks into valid 3D frameworks. Our experiments demonstrate improved reconstruction accuracy, the generation of valid, novel, and unique MOFs, and the ability to create novel building blocks. Our code is available at https://github.com/nayoung10/MOFFlow-2.
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Europe > Ireland (0.04)
- Asia > Singapore (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Interaction Topological Transformer for Multiscale Learning in Porous Materials
Chen, Dong, Liu, Jian, Chen, Chun-Long, Wei, Guo-Wei
Porous materials exhibit vast structural diversity and support critical applications in gas storage, separations, and catalysis. However, predictive modeling remains challenging due to the multiscale nature of structure-property relationships, where performance is governed by both local chemical environments and global pore-network topology. These complexities, combined with sparse and unevenly distributed labeled data, hinder generalization across material families. We propose the Interaction Topological Transformer (ITT), a unified data-efficient framework that leverages novel interaction topology to capture materials information across multiple scales and multiple levels, including structural, elemental, atomic, and pairwise-elemental organization. ITT extracts scale-aware features that reflect both compositional and relational structure within complex porous frameworks, and integrates them through a built-in Transformer architecture that supports joint reasoning across scales. Trained using a two-stage strategy, i.e., self-supervised pretraining on 0.6 million unlabeled structures followed by supervised fine-tuning, ITT achieves state-of-the-art, accurate, and transferable predictions for adsorption, transport, and stability properties. This framework provides a principled and scalable path for learning-guided discovery in structurally and chemically diverse porous materials.
- North America > United States > Michigan (0.04)
- North America > United States > Washington > Benton County > Richland (0.04)
A Derivation of variational inference
ELBO can be formulated into maximizing the objective of V AE as in Eq. (4). Based on the condition (i.e.subject to) of the loss function, we enforce z In total 3,780 MOFs were selected for the experiment. QMOF dataset was summarized in Appendix, Table 1. Meshes in MeshSeg dataset can be formed into graphs of triangle grids. The statistics of MeshSeg dataset has been summarized in Appendix, Table 1.
MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow
Yan, Xiaoli, Hudson, Nathaniel, Park, Hyun, Grzenda, Daniel, Pauloski, J. Gregory, Schwarting, Marcus, Pan, Haochen, Harb, Hassan, Foreman, Samuel, Knight, Chris, Gibbs, Tom, Chard, Kyle, Chaudhuri, Santanu, Tajkhorshid, Emad, Foster, Ian, Moosavi, Mohamad, Ward, Logan, Huerta, E. A.
We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screening and filtering AI-generated MOFs using molecular dynamics, density functional theory, and Monte Carlo simulations. These heterogeneous tasks are unified within an online learning framework that optimizes the utilization of available CPU and GPU resources across HPC systems. Performance metrics from a 450-node (14,400 AMD Zen 3 CPUs + 1800 NVIDIA A100 GPUs) supercomputer run demonstrate that MOFA achieves high-throughput generation of novel MOF structures, with CO$_2$ adsorption capacities ranking among the top 10 in the hypothetical MOF (hMOF) dataset. Furthermore, the production of high-quality MOFs exhibits a linear relationship with the number of nodes utilized. The modular architecture of MOFA will facilitate its integration into other scientific applications that dynamically combine GenAI with large-scale simulations.
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Energy > Oil & Gas (0.68)
- Information Technology > Scientific Computing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)
Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need
Peng, Sijia, Xiong, Yun, Zhu, Yangyong, Shen, Zhiqiang
Time series forecasting requires balancing short-term and long-term dependencies for accurate predictions. Existing methods mainly focus on long-term dependency modeling, neglecting the complexities of short-term dynamics, which may hinder performance. Transformers are superior in modeling long-term dependencies but are criticized for their quadratic computational cost. Mamba provides a near-linear alternative but is reported less effective in time series longterm forecasting due to potential information loss. Current architectures fall short in offering both high efficiency and strong performance for long-term dependency modeling. To address these challenges, we introduce Mixture of Universals (MoU), a versatile model to capture both short-term and long-term dependencies for enhancing performance in time series forecasting. MoU is composed of two novel designs: Mixture of Feature Extractors (MoF), an adaptive method designed to improve time series patch representations for short-term dependency, and Mixture of Architectures (MoA), which hierarchically integrates Mamba, FeedForward, Convolution, and Self-Attention architectures in a specialized order to model long-term dependency from a hybrid perspective. The proposed approach achieves state-of-the-art performance while maintaining relatively low computational costs. Extensive experiments on seven real-world datasets demonstrate the superiority of MoU. Code is available at https://github.com/lunaaa95/mou/.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Germany (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
LLM-based MOFs Synthesis Condition Extraction using Few-Shot Demonstrations
Shi, Lei, Liu, Zhimeng, Yang, Yi, Wu, Weize, Zhang, Yuyang, Zhang, Hongbo, Lin, Jing, Wu, Siyu, Chen, Zihan, Li, Ruiming, Wang, Nan, Liu, Zipeng, Tan, Huobin, Gao, Hongyi, Zhang, Yue, Wang, Ge
The extraction of Metal-Organic Frameworks (MOFs) synthesis conditions from literature text has been challenging but crucial for the logical design of new MOFs with desirable functionality. The recent advent of large language models (LLMs) provides disruptively new solution to this long-standing problem and latest researches have reported over 90% F1 in extracting correct conditions from MOFs literature. We argue in this paper that most existing synthesis extraction practices with LLMs stay with the primitive zero-shot learning, which could lead to downgraded extraction and application performance due to the lack of specialized knowledge. This work pioneers and optimizes the few-shot in-context learning paradigm for LLM extraction of material synthesis conditions. First, we propose a human-AI joint data curation process to secure high-quality ground-truth demonstrations for few-shot learning. Second, we apply a BM25 algorithm based on the retrieval-augmented generation (RAG) technique to adaptively select few-shot demonstrations for each MOF's extraction. Over a dataset randomly sampled from 84,898 well-defined MOFs, the proposed few-shot method achieves much higher average F1 performance (0.93 vs. 0.81, +14.8%) than the native zero-shot LLM using the same GPT-4 model, under fully automatic evaluation that are more objective than the previous human evaluation. The proposed method is further validated through real-world material experiments: compared with the baseline zero-shot LLM, the proposed few-shot approach increases the MOFs structural inference performance (R^2) by 29.4% in average.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)