Goto

Collaborating Authors

 inhibitor


DynaMate: An Autonomous Agent for Protein-Ligand Molecular Dynamics Simulations

Guilbert, Salomé, Masschelein, Cassandra, Goumaz, Jeremy, Naida, Bohdan, Schwaller, Philippe

arXiv.org Artificial Intelligence

Force field-based molecular dynamics (MD) simulations are indispensable for probing the structure, dynamics, and functions of biomolecular systems, including proteins and protein-ligand complexes. Despite their broad utility in drug discovery and protein engineering, the technical complexity of MD setup, encompassing parameterization, input preparation, and software configuration, remains a major barrier for widespread and efficient usage. Agentic LLMs have demonstrated their capacity to autonomously execute multi-step scientific processes, and to date, they have not successfully been used to automate protein-ligand MD workflows. Here, we present DynaMate, a modular multi-agent framework that autonomously designs and executes complete MD workflows for both protein and protein-ligand systems, and offers free energy binding affinity calculations with the MM/PB(GB)SA method. The framework integrates dynamic tool use, web search, PaperQA, and a self-correcting behavior. DynaMate comprises three specialized modules, interacting to plan the experiment, perform the simulation, and analyze the results. We evaluated its performance across twelve benchmark systems of varying complexity, assessing success rate, efficiency, and adaptability. DynaMate reliably performed full MD simulations, corrected runtime errors through iterative reasoning, and produced meaningful analyses of protein-ligand interactions. This automated framework paves the way toward standardized, scalable, and time-efficient molecular modeling pipelines for future biomolecular and drug design applications.


Fine-Tuning ChemBERTa for Predicting Inhibitory Activity Against TDP1 Using Deep Learning

Zeng, Baichuan

arXiv.org Artificial Intelligence

Predicting the inhibitory potency of small molecules against Tyrosyl-DNA Phosphodiesterase 1 (TDP1) -- a key target in overcoming cancer chemoresistance--remains a critical challenge in early drug discovery. We present a deep learning framework for the quantitative regression of pIC50 values from molecular Simplified Molecular Input Line Entry System (SMILES) strings using fine-tuned variants of ChemBERTa, a pre-trained chemical language model. Leveraging a large-scale consensus dataset of 177,092 compounds, we systematically evaluate two pre-training strategies--Masked Language Modeling (MLM) and Masked Token Regression (MTR)--under stratified data splits and sample weighting to address severe activity imbalance which only 2.1% are active. Our approach outperforms classical baselines Random Predictor in both regression accuracy and virtual screening utility, and has competitive performance compared to Random Forest, achieving high enrichment factor EF@1% 17.4 and precision Precision@1% 37.4 among top-ranked predictions. The resulting model, validated through rigorous ablation and hyperparameter studies, provides a robust, ready-to-deploy tool for prioritizing TDP1 inhibitors for experimental testing. By enabling accurate, 3D-structure-free pIC50 prediction directly from SMILES, this work demonstrates the transformative potential of chemical transformers in accelerating target-specific drug discovery.




MADD: Multi-Agent Drug Discovery Orchestra

Solovev, Gleb V., Zhidkovskaya, Alina B., Orlova, Anastasia, Gubina, Nina, Vepreva, Anastasia, Golovinskii, Rodion, Tonkii, Ilya, Dubrovsky, Ivan, Gurev, Ivan, Gilemkhanov, Dmitry, Chistiakov, Denis, Aliev, Timur A., Poddiakov, Ivan, Zubkova, Galina, Skorb, Ekaterina V., Vinogradov, Vladimir, Boukhanovsky, Alexander, Nikitin, Nikolay, Dmitrenko, Andrei, Kalyuzhnaya, Anna, Savchenko, Andrey

arXiv.org Artificial Intelligence

Hit identification is a central challenge in early drug discovery, traditionally requiring substantial experimental resources. Recent advances in artificial intelligence, particularly large language models (LLMs), have enabled virtual screening methods that reduce costs and improve efficiency. However, the growing complexity of these tools has limited their accessibility to wet-lab researchers. Multi-agent systems offer a promising solution by combining the interpretability of LLMs with the precision of specialized models and tools. In this work, we present MADD, a multi-agent system that builds and executes customized hit identification pipelines from natural language queries. MADD employs four coordinated agents to handle key subtasks in de novo compound generation and screening. We evaluate MADD across seven drug discovery cases and demonstrate its superior performance compared to existing LLM-based solutions. Using MADD, we pioneer the application of AI-first drug design to five biological targets and release the identified hit molecules. Finally, we introduce a new benchmark of query-molecule pairs and docking scores for over three million compounds to contribute to the agentic future of drug design.


Symbolic Neural Generation with Applications to Lead Discovery in Drug Design

Srinivasan, Ashwin, Baskar, A, Dash, Tirtharaj, Bain, Michael, Dey, Sanjay Kumar, Banerjee, Mainak

arXiv.org Artificial Intelligence

We investigate a relatively underexplored class of hybrid neurosymbolic models integrating symbolic learning with neural reasoning to construct data generators meeting formal correctness criteria. In \textit{Symbolic Neural Generators} (SNGs), symbolic learners examine logical specifications of feasible data from a small set of instances -- sometimes just one. Each specification in turn constrains the conditional information supplied to a neural-based generator, which rejects any instance violating the symbolic specification. Like other neurosymbolic approaches, SNG exploits the complementary strengths of symbolic and neural methods. The outcome of an SNG is a triple $(H, X, W)$, where $H$ is a symbolic description of feasible instances constructed from data, $X$ a set of generated new instances that satisfy the description, and $W$ an associated weight. We introduce a semantics for such systems, based on the construction of appropriate \textit{base} and \textit{fibre} partially-ordered sets combined into an overall partial order, and outline a probabilistic extension relevant to practical applications. In this extension, SNGs result from searching over a weighted partial ordering. We implement an SNG combining a restricted form of Inductive Logic Programming (ILP) with a large language model (LLM) and evaluate it on early-stage drug design. Our main interest is the description and the set of potential inhibitor molecules generated by the SNG. On benchmark problems -- where drug targets are well understood -- SNG performance is statistically comparable to state-of-the-art methods. On exploratory problems with poorly understood targets, generated molecules exhibit binding affinities on par with leading clinical candidates. Experts further find the symbolic specifications useful as preliminary filters, with several generated molecules identified as viable for synthesis and wet-lab testing.


Artificial Intelligence Powered Identification of Potential Antidiabetic Compounds in Ficus religiosa

Alam, Md Ashad, Amanullah, Md

arXiv.org Artificial Intelligence

Diabetes mellitus is a chronic metabolic disorder that necessitates novel therapeutic innovations due to its gradual progression and the onset of various metabolic complications. Research indicates that Ficus religiosa is a conventional medicinal plant that generates bioactive phytochemicals with potential antidiabetic properties. The investigation employs ecosystem - based computational approaches utilizing artificial intelligence to investigate and evaluate compounds de rived from F icus religiosa that exhibit antidiabetic properties. A comprehensive computational procedure incorporated machine learning methodologies, molecular docking techniques, and ADMET prediction systems to assess phytochemical efficacy against the significant antidiabetic enz yme dipeptidyl peptidase - 4 (DPP - 4). Flavonoids and alkaloids have emerged as attractive phytochemicals due to their strong binding int eractions and advantageous pharmacological effects, as indicated by the study. The introduction of AI accelerated screening procedures and enhanced accuracy rates, demonstrating its efficacy in researching plant - based antidiabetic agents. The scientific fo undation now facilitates future experimental validation of natural product therapies tailored for diabetic management. Introduction Type 2 diabetes mellitus is a global metabolic illness characterized by persistent hyperglycemia due to impaired insulin secretion and resistance [1], [2] . T2DM constitutes a growing global health issue, significantly burdening healthcare systems economically [5] . Research in 2021 indicated that 537 million persons globally had diabetes, with projections estimating 643 million cases by 2030 [6] .


Supplement WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking T able of Contents

Neural Information Processing Systems

If taking a closer look at the MedDRA classification on the system organ level on its website, we can find a claim of "System Organ Classes (SOCs) which are groupings by aetiology (e.g. However, as claimed in the original paper, "It should be noted that we did not perform any preprocessing of our datasets, such as Tab. These datasets appear in MoleculeNet as well. As mentioned in the introduction in the main paper, there are also issues with inconsistent representations and undefined stereochemistry. We list an example for each in Figure 1 and Figure 1.


Decoding the dark proteome: Deep learning-enabled discovery of druggable enzymes in Wuchereria bancrofti

Shivakumar, Shawnak, Hernandez, Jefferson

arXiv.org Artificial Intelligence

Wuchereria bancrofti, the parasitic roundworm responsible for lymphatic filariasis, permanently disables over 36 million people and places 657 million at risk across 39 countries. A major bottleneck for drug discovery is the lack of functional annotation for more than 90 percent of the W. bancrofti dark proteome, leaving many potential targets unidentified. In this work, we present a novel computational pipeline that converts W. bancrofti's unannotated amino acid sequence data into precise four-level Enzyme Commission (EC) numbers and drug candidates. We utilized a DEtection TRansformer to estimate the probability of enzymatic function, fine-tuned a hierarchical nearest neighbor EC predictor on 4,476 labeled parasite proteins, and applied rejection sampling to retain only four-level EC classifications at 100 percent confidence. This pipeline assigned precise EC numbers to 14,772 previously uncharacterized proteins and discovered 543 EC classes not previously known in W. bancrofti. A qualitative triage emphasizing parasite-specific targets, chemical tractability, biochemical importance, and biological plausibility prioritized six enzymes across five separate strategies: anti-Wolbachia cell-wall inhibition, proteolysis blockade, transmission disruption, purinergic immune interference, and cGMP-signaling destabilization. We curated a 43-compound library from ChEMBL and BindingDB and co-folded across multiple protein conformers with Boltz-2. All six targets exhibited at least moderately strong predicted binding affinities below 1 micromolar, with moenomycin analogs against peptidoglycan glycosyltransferase and NTPase inhibitors showing promising nanomolar hits and well-defined binding pockets. While experimental validation remains essential, our results provide the first large-scale functional map of the W. bancrofti dark proteome and accelerate early-stage drug development for the species.