AITopics

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Neural Information Processing SystemsJun-14-2026, 11:13:03 GMT

BioCG: Constrained Generative Modeling for Biochemical Interaction Prediction

Predicting interactions between biochemical entities is a core challenge in drug discovery and systems biology, often hindered by limited data and poor generalization to unseen entities. Traditional discriminative models frequently underperform in such settings. We propose BioCG (Biochemical Constrained Generation), a novel framework that reformulates interaction prediction as a constrained sequence generation task. BioCG encodes target entities as unique discrete sequences via Iterative Residual Vector Quantization (I-RVQ) and trains a generative model to produce the sequence of an interacting partner given a query entity. A trie-guided constrained decoding mechanism, built from a catalog of valid target sequences, concentrates the model's learning on the critical distinctions between valid biochemical options, ensuring all outputs correspond to an entity within the pre-defined target catalog. An information-weighted training objective further focuses learning on the most critical decision points. BioCG achieves state-of-the-art (SOTA) performance across diverse tasks, Drug-Target Interaction (DTI), Drug-Drug Interaction (DDI), and Enzyme-Reaction Prediction, especially in data-scarce and cold-start conditions.

large language model, machine learning, natural language, (19 more...)

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Neural Information Processing SystemsFeb-12-2026, 11:46:34 GMT

e467582d42d9c13fa9603df16f31de6d-Paper-Datasets_and_Benchmarks.pdf

learning, prediction, protein, (14 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Biomedical Informatics (0.94)

Neural Information Processing SystemsFeb-11-2026, 16:32:40 GMT

FlexMol: A Flexible Toolkit for Benchmarking Molecular Relational Learning

However, DeepPurpose inherits several limitations from early MRL models.

artificial intelligence, machine learning, natural language, (20 more...)

Country:

Asia > China (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Neural Information Processing SystemsDec-25-2025, 13:27:40 GMT

PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding

We are now witnessing significant progress of deep learning methods in a variety of tasks (or datasets) of proteins. However, there is a lack of a standard benchmark to evaluate the performance of different methods, which hinders the progress of deep learning in this field. In this paper, we propose such a benchmark called PEER, a comprehensive and multi-task benchmark for Protein sEquence undERstanding. PEER provides a set of diverse protein understanding tasks including protein function prediction, protein localization prediction, protein structure prediction, protein-protein interaction prediction, and protein-ligand interaction prediction. We evaluate different types of sequence-based methods for each task including traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein language models. In addition, we also investigate the performance of these methods under the multi-task learning setting. Experimental results show that large-scale pre-trained protein language models achieve the best performance for most individual tasks, and jointly training multiple tasks further boosts the performance. The datasets and source codes of this benchmark will be open-sourced soon.

artificial intelligence, machine learning, proceedings, (8 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

arXiv.org Artificial IntelligenceDec-11-2025

KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction

Qin, Jiayu, Luo, Zhengquan, Tadmor, Guy, Chen, Changyou, Zeevi, David, Xu, Zhiqiang

Predicting molecule-protein interactions (MPIs) is a fundamental task in computational biology, with crucial applications in drug discovery and molecular function annotation. However, existing MPI models face two major challenges. First, the scarcity of labeled molecule-protein pairs significantly limits model performance, as available datasets capture only a small fraction of biological relevant interactions. Second, most methods rely solely on molecular and protein features, ignoring broader biological context such as genes, metabolic pathways, and functional annotations that could provide essential complementary information. To address these limitations, our framework first aggregates diverse biological datasets, including molecular, protein, genes and pathway-level interactions, and then develop an optimal transport-based approach to generate high-quality pseudo-labels for unlabeled molecule-protein pairs, leveraging the underlying distribution of known interactions to guide label assignment. By treating pseudo-labeling as a mechanism for bridging disparate biological modalities, our approach enables the effective use of heterogeneous data to enhance MPI prediction. We evaluate our framework on multiple MPI datasets including virtual screening tasks and protein retrieval tasks, demonstrating substantial improvements over state-of-the-art methods in prediction accuracies and zero shot ability across unseen interactions. Beyond MPI prediction, our approach provides a new paradigm for leveraging diverse biological data sources to tackle problems traditionally constrained by single- or bi-modal learning, paving the way for future advances in computational biology and drug discovery.

artificial intelligence, machine learning, natural language, (17 more...)

2512.09365

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Ilídio, Pedro, Nakano, Felipe Kenji, Gharahighehi, Alireza, D'hondt, Robbe, Cerri, Ricardo, Vens, Celine

Oxytrees: Model Trees for Bipartite Learning

arXiv.org Artificial IntelligenceNov-18-2025

Bipartite learning is a machine learning task that aims to predict interactions between pairs of instances. It has been applied to various domains, including drug-target interactions, RNA-disease associations, and regulatory network inference. Despite being widely investigated, current methods still present drawbacks, as they are often designed for a specific application and thus do not generalize to other problems or present scalability issues. To address these challenges, we propose Oxytrees: proxy-based biclustering model trees. Oxytrees compress the interaction matrix into row- and column-wise proxy matrices, significantly reducing training time without compromising predictive performance. We also propose a new leaf-assignment algorithm that significantly reduces the time taken for prediction. Finally, Oxytrees employ linear models using the Kronecker product kernel in their leaves, resulting in shallower trees and thus even faster training. Using 15 datasets, we compared the predictive performance of ensembles of Oxytrees with that of the current state-of-the-art. We achieved up to 30-fold improvement in training times compared to state-of-the-art biclustering forests, while demonstrating competitive or superior performance in most evaluation settings, particularly in the inductive setting. Finally, we provide an intuitive Python API to access all datasets, methods and evaluation measures used in this work, thus enabling reproducible research in this field.

artificial intelligence, machine learning, prediction, (17 more...)

2511.12713

Country: Europe (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Artificial IntelligenceNov-11-2025

LLM$^3$-DTI: A Large Language Model and Multi-modal data co-powered framework for Drug-Target Interaction prediction

Zhang, Yuhao, Guo, Qinghong, Chen, Qixian, Zhang, Liuwei, Cui, Hongyan, Chen, Xiyi

Drug-target interaction (DTI) prediction is of great significance for drug discovery and drug repurposing. With the accumulation of a large volume of valuable data, data-driven methods have been increasingly harnessed to predict DTIs, reducing costs across various dimensions. Therefore, this paper proposes a $\textbf{L}$arge $\textbf{L}$anguage $\textbf{M}$odel and $\textbf{M}$ulti-$\textbf{M}$odel data co-powered $\textbf{D}$rug $\textbf{T}$arget $\textbf{I}$nteraction prediction framework, named LLM$^3$-DTI. LLM$^3$-DTI constructs multi-modal data embedding to enhance DTI prediction performance. In this framework, the text semantic embeddings of drugs and targets are encoded by a domain-specific LLM. To effectively align and fuse multi-modal embedding. We propose the dual cross-attention mechanism and the TSFusion module. Finally, these multi-modal data are utilized for the DTI task through an output network. The experimental results indicate that LLM$^3$-DTI can proficiently identify validated DTIs, surpassing the performance of the models employed for comparison across diverse scenarios. Consequently, LLM$^3$-DTI is adept at fulfilling the task of DTI prediction with excellence. The data and code are available at https://github.com/chaser-gua/LLM3DTI.

large language model, machine learning, natural language, (20 more...)

2511.06269

Country:

North America > United States (0.93)
Asia > China > Zhejiang Province (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Artificial IntelligenceNov-5-2025

GRAM-DTI: adaptive multimodal representation learning for drug target interaction prediction

Jiang, Feng, Mollaysa, Amina, Ma, Hehuan, Mansi, Tommaso, Huang, Junzhou, Prakash, Mangal, Liao, Rui

Drug target interaction (DTI) prediction is a cornerstone of computational drug discovery, enabling rational design, repurposing, and mechanistic insights. While deep learning has advanced DTI modeling, existing approaches primarily rely on SMILES protein pairs and fail to exploit the rich multimodal information available for small molecules and proteins. We introduce GRAMDTI, a pretraining framework that integrates multimodal molecular and protein inputs into unified representations. GRAMDTI extends volume based contrastive learning to four modalities, capturing higher-order semantic alignment beyond conventional pairwise approaches. To handle modality informativeness, we propose adaptive modality dropout, dynamically regulating each modality's contribution during pre-training. Additionally, IC50 activity measurements, when available, are incorporated as weak supervision to ground representations in biologically meaningful interaction strengths. Experiments on four publicly available datasets demonstrate that GRAMDTI consistently outperforms state of the art baselines. Our results highlight the benefits of higher order multimodal alignment, adaptive modality utilization, and auxiliary supervision for robust and generalizable DTI prediction.

artificial intelligence, machine learning, natural language, (20 more...)

2509.21971

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Artificial IntelligenceOct-31-2025

GFlowNets for Learning Better Drug-Drug Interaction Representations

Wasi, Azmine Toushik

Drug-drug interactions pose a significant challenge in clinical pharmacology, with severe class imbalance among interaction types limiting the effectiveness of predictive models. Common interactions dominate datasets, while rare but critical interactions remain underrepre-sented, leading to poor model performance on infrequent cases. Existing methods often treat DDI prediction as a binary problem, ignoring class-specific nuances and exacerbating bias toward frequent interactions. To address this, we propose a framework combining Generative Flow Networks (GFlowNet) with Variational Graph Autoencoders (VGAE) to generate synthetic samples for rare classes, improving model balance and generate effective and novel DDI pairs. Our approach enhances predictive performance across interaction types, ensuring better clinical reliability.

artificial intelligence, data mining, machine learning, (17 more...)

2508.06576

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)