3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery

Hu, Xiuyuan, Liu, Guoqing, Chen, Can, Zhao, Yang, Zhang, Hao, Liu, Xue

arXiv.org Artificial Intelligence 

Structure-based drug discovery, encompassing the tasks of protein-ligand docking and pocket-aware 3D drug design, represents a core challenge in drug discovery. However, no existing work can deal with both tasks to effectively leverage the duality between them, and current methods for each task are hindered by challenges in modeling 3D information and the limitations of available data. To address these issues, we propose 3DMolFormer, a unified dual-channel transformerbased framework applicable to both docking and 3D drug design tasks, which exploits their duality by utilizing docking functionalities within the drug design process. Specifically, we represent 3D pocket-ligand complexes using parallel sequences of discrete tokens and continuous numbers, and we design a corresponding dual-channel transformer model to handle this format, thereby overcoming the challenges of 3D information modeling. Additionally, we alleviate data limitations through large-scale pre-training on a mixed dataset, followed by supervised and reinforcement learning fine-tuning techniques respectively tailored for the two tasks. Experimental results demonstrate that 3DMolFormer outperforms previous approaches in both protein-ligand docking and pocket-aware 3D drug design, highlighting its promising application in structure-based drug discovery. These developments hold the promise of dramatically enhancing the efficiency of drug development processes (Blanco-Gonzalez et al., 2023). Structure-based drug discovery (SBDD) is one of the most critical strategies in drug discovery practices, relying on theories of drug-receptor interactions to study the complexes formed between protein pockets and small molecule ligands (Van Montfort & Workman, 2017). SBDD encompasses two core tasks: (1) protein-ligand binding pose prediction (docking), which involves predicting the 3D binding conformation of a ligand given the 3D structure of a protein and the 2D representation of the ligand (Yang et al., 2022), and (2) pocket-aware 3D drug design, which entails designing 3D drug molecules that bind well (with low binding energy) to a given pocket target on a protein These two tasks are inherently dual, and one is predictive, while the other is generative. However, as of now, the application of machine learning in these two SBDD tasks remains widely recognized as a challenge (Pala & Clark, 2024).