MolProphecy: Bridging Medicinal Chemists' Knowledge and Molecular Pre-Trained Models via a Multi-Modal Framework
Zhao, Jianping, Zhou, Qiong, Wang, Tian, Fan, Yusi, Yang, Qian, Jiao, Li, Liu, Chang, Guo, Zhehao, Lu, Qi, Zhou, Fengfeng, Zhang, Ruochi
–arXiv.org Artificial Intelligence
MolProphecy is a human-in-the-loop (HITL) multi-modal framework designed to integrate chemists' domain knowledge into molecular property prediction models. While molecular pre-trained models have enabled significant gains in predictive accuracy, they often fail to capture the tacit, interpretive reasoning central to expert-driven molecular design. To address this, MolProphecy employs ChatGPT as a virtual chemist to simulate expert-level reasoning and decision-making. The generated chemist knowledge is embedded by the large language model (LLM) as a dedicated knowledge representation and then fused with graph-based molecular features through a gated cross-attention mechanism, enabling joint reasoning over human-derived and structural features. Evaluated on four benchmark datasets (FreeSolv, BACE, SIDER, and ClinTox), MolProphecy outperforms state-of-the-art (SOTA) models, achieving a 15.0 percent reduction in RMSE on FreeSolv and a 5.39 percent improvement in AUROC on BACE. Analysis reveals that chemist knowledge and structural features provide complementary contributions, improving both accuracy and interpretability. MolProphecy offers a practical and generalizable approach for collaborative drug discovery, with the flexibility to incorporate real chemist input in place of the current simulated proxy--without the need for model retraining. The implementation is publicly available at https://github.com/zhangruochi/MolProphecy.
arXiv.org Artificial Intelligence
Jul-8-2025
- Country:
- North America > United States (0.46)
- Asia > China (0.29)
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Technology: