A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
Zhu, Yiheng, Li, Mingyang, Liu, Junlong, Fu, Kun, Wu, Jiansheng, Li, Qiuyi, Yin, Mingze, Ye, Jieping, Wu, Jian, Wang, Zheng
–arXiv.org Artificial Intelligence
Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications, including drug discovery and protein engineering. However, in most approaches, the pre-trained models primarily focus on the characteristics of either small molecules or proteins, without delving into their binding interactions which are essential cross-domain relationships pivotal to SBDD. To fill this gap, we propose a general-purpose foundation model named BIT (an abbreviation for Biomolecular Interaction Transformer), which is capable of encoding a range of biochemical entities, including small molecules, proteins, and protein-ligand complexes, as well as various data formats, encompassing both 2D and 3D structures. Specifically, we introduce Mixture-of-Domain-Experts (MoDE) to handle the biomolecules from diverse biochemical domains and Mixture-of-Structure-Experts (MoSE) to capture positional dependencies in the molecular structures. The proposed mixture-of-experts approach enables BIT to achieve both deep fusion and domain-specific encoding, effectively capturing fine-grained molecular interactions within protein-ligand complexes. Then, we perform cross-domain pre-training on the shared Transformer backbone via several unified self-supervised denoising tasks. Experimental results on various benchmarks demonstrate that BIT achieves exceptional performance in downstream tasks, including binding affinity prediction, structure-based virtual screening, and molecular property prediction.
arXiv.org Artificial Intelligence
Mar-6-2025
- Country:
- Asia > China
- Zhejiang Province (0.14)
- North America > United States (0.46)
- Asia > China
- Genre:
- Research Report (1.00)
- Industry:
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (0.93)
- Performance Analysis > Accuracy (0.92)
- Statistical Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Machine Learning
- Data Science > Data Mining (1.00)
- Artificial Intelligence
- Information Technology