FragFM: Efficient Fragment-Based Molecular Generation via Discrete Flow Matching

Lee, Joongwon, Kim, Seonghwan, Kim, Wou Youn

arXiv.org Artificial Intelligence 

A BSTRACT We introduce FragFM, a novel fragment-based discrete flow matching framework for molecular graph generation. FragFM generates molecules at the fragment level, leveraging a coarse-to-fine autoencoding mechanism to reconstruct atom-level details. This approach reduces computational complexity while maintaining high chemical validity, enabling more efficient and scalable molecular generation. Notably, FragFM achieves over 99% validity with significantly fewer sampling steps, improving scalability while preserving molecular diversity. These results highlight the potential of fragment-based generative modeling for large-scale, property-aware molecular design, paving the way for more efficient exploration of chemical space. 1 I NTRODUCTION Deep generative models, such as diffusion and flow matching, have demonstrated remarkable success across domains like images (Nichol et al., 2021; Rombach et al., 2022; Ho et al., 2020), text (Li et al., 2022), and videos (Hu & Xu, 2023; Ho et al., 2022). Recently, their application to molecular graph generation has gained attention, where they aim to generate chemically valid molecules by leveraging the structural properties of molecular graphs (Jo et al., 2022; Vignac et al., 2022; Qin et al., 2024). However, existing atom-based generative models face scalability challenges, particularly in generating large and complex molecules. The quadratic growth of edges as graph size increases results in computational inefficiencies. At the same time, the inherent sparsity of chemical bonds makes accurate edge prediction more complex, often leading to unrealistic molecular structures or invalid connectivity constraints (Qin et al., 2023; Chen et al., 2023). Additionally, graph neural networks (GNNs) struggle to capture topological features such as rings and loops, leading to deviations from chemically valid structures. While various methods incorporate auxiliary features (e.g., spectral, ring, and valency information) to mitigate these issues, they do not fully resolve the sparsity and scalability bottlenecks (Vignac et al., 2022).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found