AITopics | Burke, Martin D.

Collaborating Authors

Burke, Martin D.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FARM: Functional Group-Aware Representations for Small Molecules

Nguyen, Thao, Huang, Kuan-Hao, Liu, Ge, Burke, Martin D., Diao, Ying, Ji, Heng

arXiv.org Artificial IntelligenceOct-6-2024

We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key innovation of FARM lies in its functional group-aware tokenization, which directly incorporates functional group information into the representations. This strategic reduction in tokenization granularity is intentionally aligned with key drivers of functional properties (i.e., functional groups), enhancing the model's understanding of chemical language. By expanding the chemical lexicon, FARM more effectively bridges SMILES and natural language, ultimately advancing the model's capacity to predict molecular properties. FARM also represents molecules from two perspectives: by using masked language modeling to capture atom-level features and by employing graph neural networks to encode the whole molecule topology. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks. These results highlight FARM's potential to improve molecular representation learning, with promising applications in drug discovery and pharmaceutical research. Artificial intelligence (AI) has emerged as a transformative tool in accelerating scientific discovery, particularly in drug development. However, one of the central challenges in this field is the scarcity of large labeled datasets required for traditional supervised learning methods. This has shifted the focus towards self-supervised pre-trained models that can extract meaningful patterns from vast amounts of unlabeled molecular data (Shen & Nicolaou, 2019). As a result, the development of robust foundation models for molecular representations is now more critical than ever. Despite significant advancements in other domains, such as natural language processing (NLP) and computer vision, there is still no dominant foundation model tailored to molecular representation in drug discovery (Zhang et al., 2023b). This paper begins to address this pressing gap by introducing an innovative approach that leverages functional group (FG)-aware tokenization in the context of both sequence-based and graph-based molecular representations.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.02082

Country:

North America > United States > Illinois (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals > Commodity Chemicals (0.95)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Robust Model-Based Optimization for Challenging Fitness Landscapes

Ghaffari, Saba, Saleh, Ehsan, Schwing, Alexander G., Wang, Yu-Xiong, Burke, Martin D., Sinha, Saurabh

arXiv.org Artificial IntelligenceOct-3-2023

Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next. These methods are challenged by sparsity of high-fitness samples in the training set, a problem that has been in the literature. A less recognized but equally important problem stems from the distribution of training samples in the design space: leading methods are not designed for scenarios where the desired optimum is in a region that is not only poorly represented in training data, but also relatively far from the highly represented low-fitness regions. We show that this problem of "separation" in the design space is a significant bottleneck in existing model-based optimization tools and propose a new approach that uses a novel VAE as its search model to overcome the problem. We demonstrate its advantage over prior methods in robustly finding improved samples, regardless of the imbalance and separation between low- and high-fitness training samples. Our comprehensive benchmark on real and semi-synthetic protein datasets as well as solution design for physics-informed neural networks, showcases the generality of our approach in discrete and continuous design spaces. Our implementation is available at https://github.com/sabagh1994/PGVAE.

artificial intelligence, machine learning, robust model-based optimization, (1 more...)

arXiv.org Artificial Intelligence

2305.1365

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback