AITopics | query molecule

Collaborating Authors

query molecule

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pharmacophore-based design by learning on voxel grids

Mahmood, Omar, Pinheiro, Pedro O., Bonneau, Richard, Saremi, Saeed, Sresht, Vishnu

arXiv.org Artificial IntelligenceDec-3-2025

Ligand-based drug discovery (LBDD) relies on making use of known binders to a protein target to find structurally diverse molecules similarly likely to bind. This process typically involves a brute force search of the known binder (query) against a molecular library using some metric of molecular similarity. One popular approach overlays the pharmacophore-shape profile of the known binder to 3D conformations enumerated for each of the library molecules, computes overlaps, and picks a set of diverse library molecules with high overlaps. While this virtual screening workflow has had considerable success in hit diversification, scaffold hopping, and patent busting, it scales poorly with library sizes and restricts candidate generation to existing library compounds. Leveraging recent advances in voxel-based generative modelling, we propose a pharmacophore-based generative model and workflows that address the scaling and fecundity issues of conventional pharmacophore-based virtual screening. We introduce \emph{VoxCap}, a voxel captioning method for generating SMILES strings from voxelised molecular representations. We propose two workflows as practical use cases as well as benchmarks for pharmacophore-based generation: \emph{de-novo} design, in which we aim to generate new molecules with high pharmacophore-shape similarities to query molecules, and fast search, which aims to combine generative design with a cheap 2D substructure similarity search for efficient hit identification. Our results show that VoxCap significantly outperforms previous methods in generating diverse \textit{de-novo} hits. When combined with our fast search workflow, VoxCap reduces computational time by orders of magnitude while returning hits for all query molecules, enabling the search of large libraries that are intractable to search by brute force.

artificial intelligence, machine learning, molecule, (18 more...)

arXiv.org Artificial Intelligence

2512.02031

Genre:

Workflow (0.96)
Research Report > New Finding (0.54)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

General Intelligence-based Fragmentation (GIF): A framework for peak-labeled spectra simulation

Martin, Margaret R., Hassoun, Soha

arXiv.org Artificial IntelligenceNov-14-2025

Despite growing reference libraries and advanced computational tools, progress in the field of metabolomics remains constrained by low rates of annotating measured spectra. The recent developments of large language models (LLMs) have led to strong performance across a wide range of generation and reasoning tasks, spurring increased interest in LLMs' application to domain-specific scientific challenges, such as mass spectra annotation. Here, we present a novel framework, General Intelligence-based Fragmentation (GIF), that guides pretrained LLMs through spectra simulation using structured prompting and reasoning. GIF utilizes tagging, structured inputs/outputs, system prompts, instruction-based prompts, and iterative refinement. Indeed, GIF offers a structured alternative to ad hoc prompting, underscoring the need for systematic guidance of LLMs on complex scientific tasks. Using GIF, we evaluate current generalist LLMs' ability to use reasoning towards fragmentation and to perform intensity prediction after fine-tuning. We benchmark performance on a novel QA dataset, the MassSpecGym QA-sim dataset, that we derive from the MassSpecGym dataset. Through these implementations of GIF, we find that GPT-4o and GPT-4o-mini achieve a cosine similarity of 0.36 and 0.35 between the simulated and true spectra, respectively, outperforming other pretrained models including GPT-5, Llama-3.1, and ChemDFM, despite GPT-5's recency and ChemDFM's domain specialization. GIF outperforms several deep learning baselines. Our evaluation of GIF highlights the value of using LLMs not only for spectra simulation but for enabling human-in-the-loop workflows and structured, explainable reasoning in molecular fragmentation.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.09571

Country: North America > United States (0.28)

Genre:

Workflow (0.89)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DrugMCTS: a drug repurposing framework combining multi-agent, RAG and Monte Carlo Tree Search

Yang, Zerui, Wan, Yuwei, Yan, Siyu, Matsuda, Yudai, Xie, Tong, Hoex, Bram, Song, Linqi

arXiv.org Artificial IntelligenceAug-1-2025

Recent advances in large language models have demonstrated considerable potential in scientific domains such as drug repositioning. However, their effectiveness remains constrained when reasoning extends beyond the knowledge acquired during pre-training. Conventional approaches, such as fine-tuning or retrieval-augmented generation, face limitations in either imposing high computational overhead or failing to fully exploit structured scientific data. To overcome these challenges, we propose DrugM-CTS, a novel framework that synergistically integrates RAG, multi-agent collaboration, and Monte Carlo Tree Search for drug repositioning. The framework employs five specialized agents tasked with retrieving and analyzing molecular and protein information, thereby enabling structured and iterative reasoning. Extensive experiments on the DrugBank and KIBA datasets demonstrate that DrugMCTS achieves substantially higher recall and robustness compared to both general-purpose LLMs and deep learning baselines. Our results highlight the importance of structured reasoning, agent-based collaboration, and feedback-driven search mechanisms in advancing LLM applications for drug repositioning.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.07426

Country:

Asia > China (0.47)
Oceania > Australia > New South Wales (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RAG-Enhanced Collaborative LLM Agents for Drug Discovery

Lee, Namkyeong, De Brouwer, Edward, Hajiramezanali, Ehsan, Biancalani, Tommaso, Park, Chanyoung, Scalia, Gabriele

arXiv.org Artificial IntelligenceMar-10-2025

Recent advances in large language models (LLMs) have shown great potential to accelerate drug discovery. However, the specialized nature of biochemical data often necessitates costly domain-specific fine-tuning, posing critical challenges. First, it hinders the application of more flexible general-purpose LLMs in cutting-edge drug discovery tasks. More importantly, it impedes the rapid integration of the vast amounts of scientific data continuously generated through experiments and research. To investigate these challenges, we propose CLADD, a retrieval-augmented generation (RAG)-empowered agentic system tailored to drug discovery tasks. Through the collaboration of multiple LLM agents, CLADD dynamically retrieves information from biomedical knowledge bases, contextualizes query molecules, and integrates relevant evidence to generate responses -- all without the need for domain-specific fine-tuning. Crucially, we tackle key obstacles in applying RAG workflows to biochemical data, including data heterogeneity, ambiguity, and multi-source integration. We demonstrate the flexibility and effectiveness of this framework across a variety of drug discovery tasks, showing that it outperforms general-purpose and domain-specific LLMs as well as traditional deep learning approaches.

agent, molecule, rag-enhanced collaborative llm agent, (12 more...)

arXiv.org Artificial Intelligence

2502.17506

Country:

North America > United States (0.28)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intelligent System for Automated Molecular Patent Infringement Assessment

Shi, Yaorui, Li, Sihang, Zhang, Taiyan, Fang, Xi, Wang, Jiankun, Liu, Zhiyuan, Zhao, Guojiang, Zhu, Zhengdan, Gao, Zhifeng, Zhong, Renxin, Zhang, Linfeng, Ke, Guolin, E, Weinan, Cai, Hengxing, Wang, Xiang

arXiv.org Artificial IntelligenceJan-12-2025

Automated drug discovery offers significant potential for accelerating the development of novel therapeutics by substituting labor-intensive human workflows with machine-driven processes. However, molecules generated by artificial intelligence may unintentionally infringe on existing patents, posing legal and financial risks that impede the full automation of drug discovery pipelines. This paper introduces PatentFinder, a novel multi-agent and tool-enhanced intelligence system that can accurately and comprehensively evaluate small molecules for patent infringement. PatentFinder features five specialized agents that collaboratively analyze patent claims and molecular structures with heuristic and model-based tools, generating interpretable infringement reports. To support systematic evaluation, we curate MolPatent-240, a benchmark dataset tailored for patent infringement assessment algorithms. On this benchmark, PatentFinder outperforms baseline methods that rely solely on large language models or specialized chemical tools, achieving a 13.8% improvement in F1-score and a 12% increase in accuracy. Additionally, PatentFinder autonomously generates detailed and interpretable patent infringement reports, showcasing enhanced accuracy and improved interpretability. The high accuracy and interpretability of PatentFinder make it a valuable and reliable tool for automating patent infringement assessments, offering a practical solution for integrating patent protection analysis into the drug discovery pipeline.

molecule, patent, query molecule, (13 more...)

arXiv.org Artificial Intelligence

2412.07819

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Beijing > Beijing (0.04)
(9 more...)

Genre: Research Report > New Finding (0.92)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Cloud-Based Real-Time Molecular Screening Platform with MolFormer

Belgodere, Brian, Chenthamarakshan, Vijil, Das, Payel, Dognin, Pierre, Kurien, Toby, Melnyk, Igor, Mroueh, Youssef, Padhi, Inkit, Rigotti, Mattia, Ross, Jarret, Schiff, Yair, Young, Richard A.

arXiv.org Artificial IntelligenceAug-13-2022

With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The platform currently supports three tasks: nearest neighbor retrieval, chemical space visualization, and property prediction. Based on the functionalities of this platform and results obtained, we believe that such a platform can play a pivotal role in automating chemistry and chemical engineering research, as well as assist in drug discovery and material design tasks. A demo of our platform is provided at www.ibm.biz/molecular_demo.

cloud-based real-time molecular screening platform, molecule, molformer, (11 more...)

arXiv.org Artificial Intelligence

2208.06665

Genre: Research Report (0.51)

Industry:

Information Technology > Services (0.73)
Health & Medicine > Pharmaceuticals & Biotechnology (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

ROCS-Derived Features for Virtual Screening

Kearnes, Steven, Pande, Vijay

arXiv.org Machine LearningAug-22-2016

Ligand-based virtual screening is based on the assumption that similar compounds have similar biological activity [Willett, 2009]. Compound similarity can be assessed in many ways, including comparisons of molecular "fingerprints" that encode structural features or molecular properties [Todeschini and Consonni, 2009] and measurements of shape, chemical, and/or electrostatic similarity in three dimensions [Hawkins et al., 2007; Muchmore et al., 2006; Ballester and Richards, 2007]. Three-dimensional approaches such as rapid overlay of chemical structures (ROCS) [Hawkins et al., 2007] are especially interesting because of their potential to identify molecules that are similar from the point of view of a target protein but dissimilar in underlying chemical structure ("scaffold hopping"; [Böhm et al., 2004]). ROCS represents atoms as three-dimensional Gaussian functions [Grant and Pickup, 1995; Grant et al., 1996] and calculates similarity as a function of volume overlaps between alignments of pre-generated molecular conformers. Chemical ("color") similarity is measured by overlaps between dummy atoms marking interesting chemical functionalities: hydrogen bond donors and acceptors, charged functional groups, rings, and hydrophobic groups.

artificial intelligence, machine learning, sign test 95, (14 more...)

arXiv.org Machine Learning

doi: 10.1007/s10822-016-9959-3

1606.01822

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (0.32)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.68)

Add feedback