DrugAssist: A Large Language Model for Molecule Optimization
Ye, Geyan, Cai, Xibao, Lai, Houtim, Wang, Xing, Huang, Junhong, Wang, Longyue, Liu, Wei, Zeng, Xiangxiang
–arXiv.org Artificial Intelligence
Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through humanmachine dialogue by leveraging LLM's strong interactivity and generalizability. DrugAssist has achieved leading results in both single and multiple property optimization, simultaneously showcasing immense potential in transferability and iterative optimization. In addition, we publicly release a large instructionbased dataset called "MolOpt-Instructions" for fine-tuning language models on molecule optimization tasks. Figure 1: The illustration of our proposed DrugAssist model framework, which focus on optimizing molecules through human-machine dialogue. Recently, generative artificial intelligence has made remarkable strides in the field of natural language processing (NLP), particularly with the advent of Large Language Models (LLMs) such as GPT (Generative Pre-trained Transformer) (Radford et al., 2019). These models have demonstrated impressive capabilities in a wide range of tasks, extending far beyond everyday communication and question-answering scenarios.
arXiv.org Artificial Intelligence
Dec-28-2023