Towards 3D Molecule-Text Interpretation in Language Models
Li, Sihang, Liu, Zhiyuan, Luo, Yanchen, Wang, Xiang, He, Xiangnan, Kawaguchi, Kenji, Chua, Tat-Seng, Tian, Qi
–arXiv.org Artificial Intelligence
Language Models (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular Language Modeling. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder. This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space. Moreover, to enhance 3D-MoLM's ability of cross-modal molecular understanding and instruction following, we meticulously curated a 3D molecule-centric instruction tuning dataset - 3D-MoIT. Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM. The advancement of Language Models (LMs) (Devlin et al., 2019; OpenAI, 2023b; Touvron et al., 2023a) has triggered a series of remarkable innovations across multiple disciplines (Zhao et al., 2023). Notably, LMs excel at text-based molecule understanding tasks, such as question-answering (QA) in the chemical and medical domains (Taylor et al., 2022), by pretraining on extensive biochemical literature. Recognizing the potential of LMs in harnessing extensive biochemical knowledge for molecule-relevant tasks, molecule-text modeling emerges as a new research direction (Edwards et al., 2021; 2022). Previous works have been dedicated to harmonizing texts with 1D molecular sequences (Zeng et al., 2022; Taylor et al., 2022) and 2D molecular graphs (Su et al., 2022; Liu et al., 2022a), aiding in tasks like molecule-text retrieval and molecule captioning. However, they mostly leave 3D molecular structures untouched, which are crucial to understanding molecular dynamics, protein-ligand interactions, enzymatic functions, and a range of other biomolecular phenomena (Karplus & McCammon, 2002; Jorgensen, 2004). To bridge this gap, we focus on 3D molecule-text interpretation, with the goal of enabling an LM to interpret and analyze 3D molecular structures through text generation. Given the recent successes of 3D molecular encoders in tasks like molecule property prediction, docking, and conformation prediction (Zhou et al., 2023; Lu et al., 2023; Fang et al., 2022), it is promising to incorporate one as an LM's perception module for 3D molecules.
arXiv.org Artificial Intelligence
Jan-24-2024