AITopics | Diao, Ying

Collaborating Authors

Diao, Ying

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FARM: Functional Group-Aware Representations for Small Molecules

Nguyen, Thao, Huang, Kuan-Hao, Liu, Ge, Burke, Martin D., Diao, Ying, Ji, Heng

arXiv.org Artificial IntelligenceOct-6-2024

We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key innovation of FARM lies in its functional group-aware tokenization, which directly incorporates functional group information into the representations. This strategic reduction in tokenization granularity is intentionally aligned with key drivers of functional properties (i.e., functional groups), enhancing the model's understanding of chemical language. By expanding the chemical lexicon, FARM more effectively bridges SMILES and natural language, ultimately advancing the model's capacity to predict molecular properties. FARM also represents molecules from two perspectives: by using masked language modeling to capture atom-level features and by employing graph neural networks to encode the whole molecule topology. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks. These results highlight FARM's potential to improve molecular representation learning, with promising applications in drug discovery and pharmaceutical research. Artificial intelligence (AI) has emerged as a transformative tool in accelerating scientific discovery, particularly in drug development. However, one of the central challenges in this field is the scarcity of large labeled datasets required for traditional supervised learning methods. This has shifted the focus towards self-supervised pre-trained models that can extract meaningful patterns from vast amounts of unlabeled molecular data (Shen & Nicolaou, 2019). As a result, the development of robust foundation models for molecular representations is now more critical than ever. Despite significant advancements in other domains, such as natural language processing (NLP) and computer vision, there is still no dominant foundation model tailored to molecular representation in drug discovery (Zhang et al., 2023b). This paper begins to address this pressing gap by introducing an innovative approach that leverages functional group (FG)-aware tokenization in the context of both sequence-based and graph-based molecular representations.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.02082

Country:

North America > United States > Illinois (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals > Commodity Chemicals (0.95)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices

Nguyen, Thao, Torres-Flores, Tiara, Hwang, Changhyun, Edwards, Carl, Diao, Ying, Ji, Heng

arXiv.org Artificial IntelligenceMay-23-2024

This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model. In this low-data regime, GLaD leverages properties learned from large language models (LLMs) pretrained on extensive scientific literature to enrich molecular structural representations, allowing for a multimodal representation of molecules. GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency. Furthermore, GLaD showcases versatility, as it applies to a range of molecular property prediction tasks (BBBP, BACE, ClinTox, and SIDER), not limited to those concerning OPV materials. Especially, GLaD proves valuable for tasks in low-data regimes within the chemical space, as it enriches molecular representations by incorporating molecular property descriptions learned from large-scale pretraining. This capability is significant in real-world scientific endeavors like drug and material discovery, where access to comprehensive data is crucial for informed decision-making and efficient exploration of the chemical space.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.14203

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Energy > Renewable > Solar (0.87)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback