AITopics | bioassay

Collaborating Authors

bioassay

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking

Neural Information Processing SystemsFeb-15-2026, 00:24:18 GMT

Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, W elQrate .

artificial intelligence, drug discovery, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon (0.04)
North America > United States > Delaware > New Castle County > Wilmington (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.69)

Add feedback

Assay2Mol: large language model-based drug design using BioAssay context

Deng, Yifan, Ericksen, Spencer S., Gitter, Anthony

arXiv.org Artificial IntelligenceNov-18-2025

Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate candidate molecules' functional responses against disease targets. Unstructured text that describes the biological mechanisms through which these targets operate, experimental screening protocols, and other attributes of assays offer rich information for drug discovery campaigns but has been untapped because of that unstructured format. We present Assay2Mol, a large language model-based workflow that can capitalize on the vast existing biochemical screening assays for early-stage drug discovery. Assay2Mol retrieves existing assay records involving targets similar to the new target and generates candidate molecules using in-context learning with the retrieved assay screening data. Assay2Mol outperforms recent machine learning approaches that generate candidate ligand molecules for target protein structures, while also promoting more synthesizable molecule generation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.emnlp-main.1390

2507.12574

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.92)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking

Neural Information Processing SystemsOct-10-2025, 04:08:05 GMT

Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, W elQrate .

drug discovery, elqrate dataset collection, molecule, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon (0.04)
North America > United States > Delaware > New Castle County > Wilmington (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.69)

Add feedback

WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking

Liu, Yunchao, Dong, Ha, Wang, Xin, Moretti, Rocco, Wang, Yu, Su, Zhaoqian, Gu, Jiawei, Bodenheimer, Bobby, Weaver, Charles David, Meiler, Jens, Derr, Tyler

arXiv.org Artificial IntelligenceNov-14-2024

While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, WelQrate. Specifically, our contributions are threefold: WelQrate Dataset Collection - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; WelQrate Evaluation Framework - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; Benchmarking - we evaluate model performance through various research questions using the WelQrate dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed WelQrate as the gold standard in small molecule drug discovery benchmarking. The WelQrate dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.0982

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Delaware > New Castle County > Wilmington (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

TwinBooster: Synergising Large Language Models with Barlow Twins and Gradient Boosting for Enhanced Molecular Property Prediction

Schuh, Maximilian G., Boldini, Davide, Sieber, Stephan A.

arXiv.org Artificial IntelligenceJan-30-2024

The success of drug discovery and development relies on the precise prediction of molecular activities and properties. While in silico molecular property prediction has shown remarkable potential, its use has been limited so far to assays for which large amounts of data are available. In this study, we use a fine-tuned large language model to integrate biological assays based on their textual information, coupled with Barlow Twins, a Siamese neural network using a novel self-supervised learning approach. This architecture uses both assay information and molecular fingerprints to extract the true molecular information. TwinBooster enables the prediction of properties of unseen bioassays and molecules by providing state-of-the-art zero-shot learning tasks. Remarkably, our artificial intelligence pipeline shows excellent performance on the FS-Mol benchmark. This breakthrough demonstrates the application of deep learning to critical property prediction tasks where data is typically scarce. By accelerating the early identification of active molecules in drug discovery and development, this method has the potential to help streamline the identification of novel therapeutics.

molecule, pr auc, prediction, (12 more...)

arXiv.org Artificial Intelligence

2401.04478

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Multi-task Bioassay Pre-training for Protein-ligand Binding Affinity Prediction

Yan, Jiaxian, Ye, Zhaofeng, Yang, Ziyi, Lu, Chengqiang, Zhang, Shengyu, Liu, Qi, Qiu, Jiezhong

arXiv.org Artificial IntelligenceDec-20-2023

Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of highquality training data, the generalization ability of current models is still limited. In addition, different bioassays use varying affinity measurement labels (i.e., IC50, Ki, Kd), and different experimental conditions inevitably introduce systematic noise, which poses a significant challenge to constructing high-precision affinity prediction models. To address these issues, we (1) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (2) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked three-dimensional structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP as a general framework that can improve and be tailored to mainstream structure-based PLBA prediction tasks. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. All codes and data are available on the online platform https://github.com/jiaxianyan/MBP.

dataset, mbp, prediction, (15 more...)

arXiv.org Artificial Intelligence

2306.04886

Country: Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language

Seidl, Philipp, Vall, Andreu, Hochreiter, Sepp, Klambauer, Günter

arXiv.org Artificial IntelligenceJun-16-2023

Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently they have to be trained or fine-tuned for new tasks. Without training or fine-tuning, scientific language models could be used for such low-data tasks through their announced zero- and few-shot capabilities. However, their predictive quality at activity prediction is lacking. In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. To this end, we propose a new architecture with separate modules for chemical and natural language inputs, and a contrastive pre-training objective on data from large biochemical databases. In extensive experiments, we show that our method CLAMP yields improved predictive performance on few-shot learning benchmarks and zero-shot problems in drug discovery. We attribute the advances of our method to the modularized architecture and to our pre-training objective.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2303.03363

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Dominican Republic (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Government > Regional Government > North America Government > United States Government > FDA (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Easy Semantification of Bioassays

Anteghini, Marco, D'Souza, Jennifer, Santos, Vitor A. P. Martins dos, Auer, Sören

arXiv.org Artificial IntelligenceDec-2-2021

Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two methods are on opposite ends of the method complexity spectrum. Characteristically modeling our problem, we find the clustering solution significantly outperforms a deep neural network state-of-the-art labeling approach. This novel contribution is based on two factors: 1) a learning objective closely modeled after the data outperforms an alternative approach with sophisticated semantic modeling; 2) automatically semantifying biological assays achieves a high performance F 1 of nearly 83%, which to our knowledge is the first reported standardized evaluation of the task offering a strong benchmark model.

bioassay, logical statement, semantification, (14 more...)

arXiv.org Artificial Intelligence

2111.15182

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Communications > Web > Semantic Web (0.86)
(2 more...)

Add feedback

Improving Compound Activity Classification via Deep Transfer and Representation Learning

Dey, Vishal, Machiraju, Raghu, Ning, Xia

arXiv.org Artificial IntelligenceNov-14-2021

Recent advances in molecular machine learning, especially deep neural networks such as Graph Neural Networks (GNNs) for predicting structure activity relationships (SAR) have shown tremendous potential in computer-aided drug discovery. However, the applicability of such deep neural networks are limited by the requirement of large amounts of training data. In order to cope with limited training data for a target task, transfer learning for SAR modeling has been recently adopted to leverage information from data of related tasks. In this work, in contrast to the popular parameter-based transfer learning such as pretraining, we develop novel deep transfer learning methods TAc and TAc-fc to leverage source domain data and transfer useful information to the target domain. TAc learns to generate effective molecular features that can generalize well from one domain to another, and increase the classification performance in the target domain. Additionally, TAc-fc extends TAc by incorporating novel components to selectively learn feature-wise and compound-wise transferability. We used the bioassay screening data from PubChem, and identified 120 pairs of bioassays such that the active compounds in each pair are more similar to each other compared to its inactive compounds. Overall, TAc achieves the best performance with average ROC-AUC of 0.801; it significantly improves ROC-AUC of 83% target tasks with average task-wise performance improvement of 7.102%, compared to the best baseline FCN-dmpna (DT). Our experiments clearly demonstrate that TAc achieves significant improvement over all baselines across a large number of target tasks. Furthermore, although TAc-fc achieves slightly worse ROC-AUC on average compared to TAc (0.798 vs 0.801), TAc-fc still achieves the best performance on more tasks in terms of PR-AUC and F1 compared to other methods.

bioassay, compound, target task, (16 more...)

arXiv.org Artificial Intelligence

2111.07439

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SciBERT-based Semantification of Bioassays in the Open Research Knowledge Graph

Anteghini, Marco, D'Souza, Jennifer, Santos, Vitor A. P. Martins dos, Auer, Sören

arXiv.org Artificial IntelligenceSep-16-2020

As a novel contribution to the problem of semantifying biological assays, in this paper, we propose a neural-network-based approach to automatically semantify, thereby structure, unstructured bioassay text descriptions. Experimental evaluations, to this end, show promise as the neural-based semantification significantly outperforms a naive frequency-based baseline approach. Specifically, the neural method attains 72% F1 versus 47% F1 from the frequency-based method.

artificial intelligence, bioassay, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2009.08801

Country:

Europe > Netherlands (0.05)
Europe > Germany > Lower Saxony > Hanover (0.05)
Europe > Germany > Berlin (0.05)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (0.50)
Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback