MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation

Cai, Feiyang, Bai, Jiahui, Tang, Tao, He, Guijuan, Luo, Joshua, Zhu, Tianyu, Pilla, Srikanth, Li, Gang, Liu, Ling, Luo, Feng

arXiv.org Artificial Intelligence 

The chemist begins by thoroughly analyzing the molecular structure--recognizing the core scaffold, functional groups, stereochemical configurations, and the relative positions of these structural elements. With this understanding, the chemist reasons about potential modifications, such as substituting functional groups, adjusting ring systems, or altering stereochemistry, to improve the target property. Finally, guided by precise modification instructions, the chemist applies these changes to generate an optimized molecule (illustrated in Figure 1a). In another scenario, a chemist may need to design a new (de novo) molecule to satisfy a set of property requirements. This process involves reasoning about possible molecular structures that meet the design constraints and then synthesizing a novel compound based on the resulting detailed structural specification (shown in Figure 1b). Recent advancements in artificial intelligence (AI) provide promising opportunities to assist chemists in these complex workflows. While traditional molecular models, based on graph or sequence representations such as SMILES (Weininger, 1988), have demonstrated strong performance in domain-specific tasks including molecular property prediction (Huang et al., 2021), reaction prediction (Lowe, 2017), and conditional generation (Brown et al., 2019), they operate exclusively within the molecular modality and lack the ability to interpret or execute natural language instructions. Meanwhile, large language models (LLMs) have exhibited remarkable reasoning capabilities (OpenAI, 2025d), showing promise in tackling scientific problems that traditionally require Clemson University.