Stiefel Flow Matching for Moment-Constrained Structure Elucidation

Cheng, Austin, Lo, Alston, Lee, Kin Long Kelvin, Miret, Santiago, Aspuru-Guzik, Alán

arXiv.org Artificial Intelligence 

Molecular structure elucidation is a fundamental step in understanding chemical phenomena, with applications in identifying molecules in natural products, lab syntheses, forensic samples, and the interstellar medium. We consider the task of predicting a molecule's all-atom 3D structure given only its molecular formula and moments of inertia, motivated by the ability of rotational spectroscopy to measure these moments. While existing generative models can conditionally sample 3D structures with approximately correct moments, this soft conditioning fails to leverage the many digits of precision afforded by experimental rotational spectroscopy. To address this, we first show that the space of n-atom point clouds with a fixed set of moments of inertia is embedded in the Stiefel manifold St(n, 4). We then propose Stiefel Flow Matching as a generative model for elucidating 3D structure under exact moment constraints. Additionally, we learn simpler and shorter flows by finding approximate solutions for equivariant optimal transport on the Stiefel manifold. Empirically, enforcing exact moment constraints allows Stiefel Flow Matching to achieve higher success rates and faster sampling than Euclidean diffusion models, even on high-dimensional manifolds corresponding to large molecules in the GEOM dataset. Elucidating the structure of unknown molecules is a central task in chemistry, important for analyzing environmental samples (Moneta et al., 2023), identifying novel drugs (Sonstrom et al., 2023), and determining potential building blocks of life in the interstellar medium (McGuire et al., 2016). The challenge is to aggregate information from multiple sources of analytical data to unambiguously determine a molecule's structure. Rotational spectroscopy holds a unique capacity to provide precise measurements of a molecule's rotational constants, which are closely related to its moments of inertia. In turn, the connection between these moments and 3D structure has routinely provided the highest quality gas-phase 3D structures attainable from experiment (Domingos et al., 2020). Typically, structure elucidation with rotational spectroscopy proceeds by confirming whether a known structure's moments match with experiment (Lee & McCarthy, 2019; McCarthy et al., 2020). However, this approach is inherently restricted to molecules whose structures have already been catalogued, and leaves no prescription for undiscovered molecules such as novel natural products and key reactive intermediate species that cannot be easily isolated (Womack et al., 2015).