small molecule compound
Language model driven: a PROTAC generation pipeline with dual constraints of structure and property
Shao, Jinsong, Gong, Qineng, Yin, Zeyu, Chen, Yu, Hao, Yajie, Zhang, Lei, Jiang, Linlin, Yao, Min, Li, Jinlong, Wang, Fubo, Wang, Li
The imperfect modeling of ternary complexes has limited the application of computer-aided drug discovery tools in PROTAC research and development. In this study, an AI-assisted approach for PROTAC molecule design pipeline named LM-PROTAC was developed, which stands for language model driven Proteolysis Targeting Chimera, by embedding a transformer-based generative model with dual constraints on structure and properties, referred to as the DCT. This study utilized the fragmentation representation of molecules and developed a language model driven pipeline. Firstly, a language model driven affinity model for protein compounds to screen molecular fragments with high affinity for the target protein. Secondly, structural and physicochemical properties of these fragments were constrained during the generation process to meet specific scenario requirements. Finally, a two-round screening of the preliminary generated molecules using a multidimensional property prediction model to generate a batch of PROTAC molecules capable of degrading disease-relevant target proteins for validation in vitro experiments, thus achieving a complete solution for AI-assisted PROTAC drug generation. Taking the tumor key target Wnt3a as an example, the LM-PROTAC pipeline successfully generated PROTAC molecules capable of inhibiting Wnt3a. The results show that DCT can efficiently generate PROTAC that targets and hydrolyses Wnt3a.
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.93)
- Materials > Chemicals (0.93)
Four Major Benefits of AI in Drug Discovery
Among various applications of AI technology in the pharmaceutical industry, some are viewed as most important and worth more depth of exploration. The first step in drug development is to understand the biological origin and mechanism of the disease, and then to determine suitable targets through high-throughput technologies such as shRNA screening and deep sequencing, and finally to find relevant patterns through a large number of diverse data sources. This is huge work and often presents an important challenge for traditional methods. Unlike traditional methods, AI can systematically analyze existing literature and data in just a few seconds. This real-time "omics" database analysis can more accurately understand pathological cells and molecular mechanisms, and it can be used for complex diseases such as neurodegenerative diseases.
Domain-Adversarial Multi-Task Framework for Novel Therapeutic Property Prediction of Compounds
Xie, Lingwei, He, Song, Yang, Shu, Feng, Boyuan, Wan, Kun, Zhang, Zhongnan, Bo, Xiaochen, Ding, Yufei
With the rapid development of high-throughput technologies, parallel acquisition of large-scale drug-informatics data provides huge opportunities to improve pharmaceutical research and development. One significant application is the purpose prediction of small molecule compounds, aiming to specify therapeutic properties of extensive purpose-unknown compounds and to repurpose novel therapeutic properties of FDA-approved drugs. Such problem is very challenging since compound attributes contain heterogeneous data with various feature patterns such as drug fingerprint, drug physicochemical property, drug perturbation gene expression. Moreover, there is complex nonlinear dependency among heterogeneous data. In this paper, we propose a novel domain-adversarial multi-task framework for integrating shared knowledge from multiple domains. The framework utilizes the adversarial strategy to effectively learn target representations and models their nonlinear dependency. Experiments on two real-world datasets illustrate that the performance of our approach obtains an obvious improvement over competitive baselines. The novel therapeutic properties of purpose-unknown compounds we predicted are mostly reported or brought to the clinics. Furthermore, our framework can integrate various attributes beyond the three domains examined here and can be applied in the industry for screening the purpose of huge amounts of as yet unidentified compounds. Source codes of this paper are available on Github.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- Asia > China > Fujian Province > Xiamen (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Middle East > Jordan (0.04)