MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

Wang, Liang, Liu, Shaozhen, Rong, Yu, Zhao, Deli, Liu, Qiang, Wu, Shu, Wang, Liang

arXiv.org Artificial Intelligence 

Published as a conference paper at ICLR 2025M OLS PECTRA: P RETRAINING 3D M OLECULAR R EP-RESENTATION WITHM ULTI-MODALE NERGYS PECTRA Liang Wang 1,2 Shaozhen Liu 1 Y u Rong 3 Deli Zhao 3 Qiang Liu 1,2 Shu Wu 1,2 Liang Wang 1,2 1 New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA) 2 School of Artificial Intelligence, University of Chinese Academy of Sciences 3 DAMO Academy, Alibaba Group A BSTRACT Establishing the relationship between 3D structures and the energy states of molecular systems has proven to be a promising approach for learning 3D molecular representations. However, existing methods are limited to modeling the molecular energy states from classical mechanics. This limitation results in a significant oversight of quantum mechanical effects, such as quantized (discrete) energy level structures, which offer a more accurate estimation of molecular energy and can be experimentally measured through energy spectra. In this paper, we propose to utilize the energy spectra to enhance the pre-training of 3D molecular representations (MolSpectra), thereby infusing the knowledge of quantum mechanics into the molecular representations. Specifically, we propose SpecFormer, a multi-spectrum encoder for encoding molecular spectra via masked patch reconstruction. By further aligning outputs from the 3D encoder and spectrum encoder using a contrastive objective, we enhance the 3D encoder's understanding of molecules. Evaluations on public benchmarks reveal that our pre-trained representations surpass existing methods in predicting molecular properties and modeling dynamics. Given the scarcity of molecular property labels, self-supervised representation pre-training has been proposed and utilized to provide generalizable representations (Hu et al., 2020; Rong et al., 2020; Ma et al., 2024). In contrast to contrastive learning (Wang et al., 2022; Kim et al., 2022) and masked modeling (Hou et al., 2022; Liu et al., 2023c; Wang et al., 2024b) on 2D molecular graphs and molecular languages (e.g., SMILES), the design of pre-training strategies on 3D molecular geometries is more closely aligned with physical principles. Previous studies (Zaidi et al., 2023; Jiao et al., 2023) have guided representation learning through denoising processes on 3D molecular geometries, theoretically demonstrating that denoising 3D geometries is equivalent to learning molecular force fields, specifically the negative gradient of molecular potential energy with respect to position. Essentially, these studies reveal that establishing the relationship between 3D geometries and the energy states of molecular systems is an effective pathway to learn 3D molecular representations.