AITopics | Schwaller, Philippe

Plotting

Schwaller, Philippe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Molecular Hypergraph Neural Networks

Chen, Junwu, Schwaller, Philippe

arXiv.org Artificial IntelligenceDec-21-2023

Graph neural networks (GNNs) have demonstrated promising performance across various chemistry-related tasks. However, conventional graphs only model the pairwise connectivity in molecules, failing to adequately represent higher-order connections like multi-center bonds and conjugated structures. To tackle this challenge, we introduce molecular hypergraphs and propose Molecular Hypergraph Neural Networks (MHNN) to predict the optoelectronic properties of organic semiconductors, where hyperedges represent conjugated structures. A general algorithm is designed for irregular high-order connections, which can efficiently operate on molecular hypergraphs with hyperedges of various orders. The results show that MHNN outperforms all baseline models on most tasks of OPV, OCELOTv1 and PCQM4Mv2 datasets. Notably, MHNN achieves this without any 3D geometric information, surpassing the baseline model that utilizes atom positions. Moreover, MHNN achieves better performance than pretrained GNNs under limited training data, underscoring its excellent data efficiency. This work provides a new strategy for more general molecular representations and property prediction tasks related to high-order connections.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.13136

Country: North America > United States (0.68)

Genre: Research Report (0.70)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

FSscore: A Machine Learning-based Synthetic Feasibility Score Leveraging Human Expertise

Neeser, Rebecca M., Correia, Bruno, Schwaller, Philippe

arXiv.org Artificial IntelligenceDec-19-2023

Determining whether a molecule can be synthesized is crucial for many aspects of chemistry and drug discovery, allowing prioritization of experimental work and ranking molecules in de novo design tasks. Existing scoring approaches to assess synthetic feasibility struggle to extrapolate to out-of-distribution chemical spaces or fail to discriminate based on minor differences such as chirality that might be obvious to trained chemists. This work aims to address these limitations by introducing the Focused Synthesizability score (FSscore), which learns to rank structures based on binary preferences using a graph attention network. First, a baseline trained on an extensive set of reactant-product pairs is established that subsequently is fine-tuned with expert human feedback on a chemical space of interest. Fine-tuning on focused datasets improves performance on these chemical scopes over the pre-trained model exhibiting moderate performance and generalizability. This enables distinguishing hard- from easy-to-synthesize molecules and improving the synthetic accessibility of generative model outputs. On very complex scopes with limited labels achieving satisfactory gains remains challenging. The FSscore showcases how human expert feedback can be utilized to optimize the assessment of synthetic feasibility for a variety of applications.

artificial intelligence, machine learning, molecule, (19 more...)

arXiv.org Artificial Intelligence

2312.12737

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Holistic chemical evaluation reveals pitfalls in reaction prediction models

Gil, Victor Sabanza, Bran, Andres M., Franke, Malte, Schlama, Remi, Luterbacher, Jeremy S., Schwaller, Philippe

arXiv.org Artificial IntelligenceDec-14-2023

The prediction of chemical reactions has gained significant interest within the machine learning community in recent years, owing to its complexity and crucial applications in chemistry. However, model evaluation for this task has been mostly limited to simple metrics like top-k accuracy, which obfuscates fine details of a model's limitations. Inspired by progress in other fields, we propose a new assessment scheme that builds on top of current approaches, steering towards a more holistic evaluation. We introduce the following key components for this goal: CHORISO, a curated dataset along with multiple tailored splits to recreate chemically relevant scenarios, and a collection of metrics that provide a holistic view of a model's advantages and limitations. Application of this method to state-of-the-art models reveals important differences on sensitive fronts, especially stereoselectivity and chemical out-of-distribution generalization. Our work paves the way towards robust prediction models that can ultimately accelerate chemical discovery.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2312.09004

Country:

North America > United States (0.16)
Europe (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals (0.94)
Energy (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Extracting human interpretable structure-property relationships in chemistry using XAI and large language models

Wellawatte, Geemi P., Schwaller, Philippe

arXiv.org Artificial IntelligenceNov-7-2023

Explainable Artificial Intelligence (XAI) is an emerging field in AI that aims to address the opaque nature of machine learning models. Furthermore, it has been shown that XAI can be used to extract input-output relationships, making them a useful tool in chemistry to understand structure-property relationships. However, one of the main limitations of XAI methods is that they are developed for technically oriented users. We propose the XpertAI framework that integrates XAI methods with large language models (LLMs) accessing scientific literature to generate accessible natural language explanations of raw chemical data automatically. We conducted 5 case studies to evaluate the performance of XpertAI. Our results show that XpertAI combines the strengths of LLMs and XAI tools in generating specific, scientific, and interpretable explanations.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.04047

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.54)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (0.95)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Transformers and Large Language Models for Chemistry and Drug Discovery

Bran, Andres M, Schwaller, Philippe

arXiv.org Artificial IntelligenceOct-9-2023

Language modeling has seen impressive progress over the last years, mainly prompted by the invention of the Transformer architecture, sparking a revolution in many fields of machine learning, with breakthroughs in chemistry and biology. In this chapter, we explore how analogies between chemical and natural language have inspired the use of Transformers to tackle important bottlenecks in the drug discovery process, such as retrosynthetic planning and chemical space exploration. The revolution started with models able to perform particular tasks with a single type of data, like linearised molecular graphs, which then evolved to include other types of data, like spectra from analytical instruments, synthesis actions, and human language. A new trend leverages recent developments in large language models, giving rise to a wave of models capable of solving generic tasks in chemistry, all facilitated by the flexibility of natural language. As we continue to explore and harness these capabilities, we can look forward to a future where machine learning plays an even more integral role in accelerating scientific discovery.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2310.06083

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ODEFormer: Symbolic Regression of Dynamical Systems with Transformers

d'Ascoli, Stéphane, Becker, Sören, Mathis, Alexander, Schwaller, Philippe, Kilbertus, Niki

arXiv.org Artificial IntelligenceOct-9-2023

Recent triumphs of machine learning (ML) spark growing enthusiasm for accelerating scientific discovery [1-3]. In particular, inferring dynamical laws governing observational data is an extremely challenging task that is anticipated to benefit substantially from modern ML methods. Modeling dynamical systems for forecasting, control, and system identification has been studied by various communities within ML. Successful modern approaches are primarily based on advances in deep learning, such as neural ordinary differential equation (NODE) (see Chen et al. [4] and many extensions thereof). However, these models typically lack interpretability due to their black-box nature, which has inspired extensive research on explainable ML of overparameterized models [5, 6].

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.05573

Country:

Europe (0.67)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.93)
Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

ChemCrow: Augmenting large-language models with chemistry tools

Bran, Andres M, Cox, Sam, Schilter, Oliver, Baldassari, Carlo, White, Andrew D, Schwaller, Philippe

arXiv.org Machine LearningOct-2-2023

Over the last decades, excellent computational chemistry tools have been developed. Integrating them into a single platform with enhanced accessibility could help reaching their full potential by overcoming steep learning curves. Recently, large-language models (LLMs) have shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 18 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and Chemcrow's performance. Our work not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.

chemcrow, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2304.05376

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beam Enumeration: Probabilistic Explainability For Sample Efficient Self-conditioned Molecular Design

Guo, Jeff, Schwaller, Philippe

arXiv.org Artificial IntelligenceSep-25-2023

Generative molecular design has moved from proof-of-concept to real-world applicability, as marked by the surge in very recent papers reporting experimental validation. Key challenges in explainability and sample efficiency present opportunities to enhance generative design to directly optimize expensive high-fidelity oracles and provide actionable insights to domain experts. Here, we propose Beam Enumeration to exhaustively enumerate the most probable sub-sequences from language-based molecular generative models and show that molecular substructures can be extracted. When coupled with reinforcement learning, extracted substructures become meaningful, providing a source of explainability and improving sample efficiency through self-conditioned generation. Beam Enumeration is generally applicable to any language-based molecular generative model and notably further improves the performance of the recently reported Augmented Memory algorithm, which achieved the new state-of-the-art on the Practical Molecular Optimization benchmark for sample efficiency. The combined algorithm generates more high reward molecules and faster, given a fixed oracle budget. Beam Enumeration is the first method to jointly address explainability and sample efficiency for molecular design.

machine learning, natural language, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2309.13957

Country:

Europe > Switzerland (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

Jablonka, Kevin Maik, Ai, Qianxiang, Al-Feghali, Alexander, Badhwar, Shruti, Bocarsly, Joshua D., Bran, Andres M, Bringuier, Stefan, Brinson, L. Catherine, Choudhary, Kamal, Circi, Defne, Cox, Sam, de Jong, Wibe A., Evans, Matthew L., Gastellu, Nicolas, Genzling, Jerome, Gil, María Victoria, Gupta, Ankur K., Hong, Zhi, Imran, Alishba, Kruschwitz, Sabine, Labarre, Anne, Lála, Jakub, Liu, Tao, Ma, Steven, Majumdar, Sauradeep, Merz, Garrett W., Moitessier, Nicolas, Moubarak, Elias, Mouriño, Beatriz, Pelkie, Brenden, Pieler, Michael, Ramos, Mayk Caldas, Ranković, Bojana, Rodriques, Samuel G., Sanders, Jacob N., Schwaller, Philippe, Schwarting, Marcus, Shi, Jiale, Smit, Berend, Smith, Ben E., Van Herck, Joren, Völker, Christoph, Ward, Logan, Warren, Sean, Weiser, Benjamin, Zhang, Sylvester, Zhang, Xiaoqi, Zia, Ghezal Ahmad, Scourtas, Aristana, Schmidt, KJ, Foster, Ian, White, Andrew D., Blaiszik, Ben

arXiv.org Artificial IntelligenceJul-14-2023

Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1039/D3DD00113J

2306.06283

Country:

Europe (1.00)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > Canada > Quebec > Montreal (0.14)
(3 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Materials > Construction Materials (1.00)
Materials > Chemicals (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Augmented Memory: Capitalizing on Experience Replay to Accelerate De Novo Molecular Design

Guo, Jeff, Schwaller, Philippe

arXiv.org Artificial IntelligenceMay-10-2023

Sample efficiency is a fundamental challenge in de novo molecular design. Ideally, molecular generative models should learn to satisfy a desired objective under minimal oracle evaluations (computational prediction or wet-lab experiment). This problem becomes more apparent when using oracles that can provide increased predictive accuracy but impose a significant cost. Consequently, these oracles cannot be directly optimized under a practical budget. Molecular generative models have shown remarkable sample efficiency when coupled with reinforcement learning, as demonstrated in the Practical Molecular Optimization (PMO) benchmark. Here, we propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay. We show that scores obtained from oracle calls can be reused to update the model multiple times. We compare Augmented Memory to previously proposed algorithms and show significantly enhanced sample efficiency in an exploitation task and a drug discovery case study requiring both exploration and exploitation. Our method achieves a new state-of-the-art in the PMO benchmark which enforces a computational budget, outperforming the previous best performing method on 19/23 tasks.

augmented memory, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2305.1616

Country:

Europe > Switzerland (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback