AITopics | Lai, Houtim

Collaborating Authors

Lai, Houtim

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

Zhou, Peng, Wang, Jianmin, Li, Chunyan, Wang, Zixu, Liu, Yiping, Sun, Siqi, Lin, Jianxin, Wei, Leyi, Cai, Xibao, Lai, Houtim, Liu, Wei, Wang, Longyue, Zeng, Xiangxiang

arXiv.org Artificial IntelligenceJul-10-2024

While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Here, we introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the 'teachers'. To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers', enabling it to generate novel molecules that conform to the descriptions through various text prompts. We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements across two-, three-, and four-constraint tasks, with an average molecular validity of over 99% and success ratio of 82.58%, 68.03%, and 67.48%, respectively. The model also exhibits adaptability through zero-shot testing, creating molecules that satisfy combinations of properties that have not been encountered. It can comprehend text inputs with various language styles, extending beyond the confines of outlined prompts, as confirmed through empirical validation. Additionally, the knowledge distillation feature of TSMMG contributes to the continuous enhancement of small models, while the innovative approach to dataset construction effectively addresses the issues of data scarcity and quality, which positions TSMMG as a promising tool in the domains of drug discovery and materials science.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.13244

Country: Asia > China (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

DrugAssist: A Large Language Model for Molecule Optimization

Ye, Geyan, Cai, Xibao, Lai, Houtim, Wang, Xing, Huang, Junhong, Wang, Longyue, Liu, Wei, Zeng, Xiangxiang

arXiv.org Artificial IntelligenceDec-28-2023

Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through humanmachine dialogue by leveraging LLM's strong interactivity and generalizability. DrugAssist has achieved leading results in both single and multiple property optimization, simultaneously showcasing immense potential in transferability and iterative optimization. In addition, we publicly release a large instructionbased dataset called "MolOpt-Instructions" for fine-tuning language models on molecule optimization tasks. Figure 1: The illustration of our proposed DrugAssist model framework, which focus on optimizing molecules through human-machine dialogue. Recently, generative artificial intelligence has made remarkable strides in the field of natural language processing (NLP), particularly with the advent of Large Language Models (LLMs) such as GPT (Generative Pre-trained Transformer) (Radford et al., 2019). These models have demonstrated impressive capabilities in a wide range of tasks, extending far beyond everyday communication and question-answering scenarios.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2401.10334

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

Ji, Yuanfeng, Zhang, Lu, Wu, Jiaxiang, Wu, Bingzhe, Huang, Long-Kai, Xu, Tingyang, Rong, Yu, Li, Lanqing, Ren, Jie, Xue, Ding, Lai, Houtim, Xu, Shaoyong, Feng, Jing, Liu, Wei, Luo, Ping, Zhou, Shuigeng, Huang, Junzhou, Zhao, Peilin, Bian, Yatao

arXiv.org Artificial IntelligenceJan-24-2022

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications. In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.

artificial intelligence, health & medicine, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2201.09637

Country:

Asia (0.28)
North America > Canada > Quebec (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Education > Focused Education > Special Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback