Goto

Collaborating Authors

 Regep, Cristian


BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

arXiv.org Artificial Intelligence

Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.


RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro

arXiv.org Artificial Intelligence

For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not necessarily generalise out of distribution. We employ a sequential model optimization search utilising a deep learning model to quickly discover synergistic drug combinations active against a cancer cell line, requiring substantially less screening than an exhaustive evaluation. Our small scale wet lab experiments only account for evaluation of ~5% of the total search space. After only 3 rounds of ML-guided in vitro experimentation (including a calibration round), we find that the set of drug pairs queried is enriched for highly synergistic combinations; two additional rounds of ML-guided experiments were performed to ensure reproducibility of trends. Remarkably, we rediscover drug combinations later confirmed to be under study within clinical trials. Moreover, we find that drug embeddings generated using only structural information begin to reflect mechanisms of action. Prior in silico benchmarking suggests we can enrich search queries by a factor of ~5-10x for highly synergistic drug combinations by using sequential rounds of evaluation when compared to random selection, or by a factor of >3x when using a pretrained model selecting all drug combinations at a single time point.


Sparse Dynamic Distribution Decomposition: Efficient Integration of Trajectory and Snapshot Time Series Data

arXiv.org Machine Learning

Dynamic Distribution Decomposition (DDD) was introduced in Taylor-King et al. [30] as a variation on Dynamic Mode Decomposition. In brief, by using basis functions over a continuous state space, DDD allows for the fitting of continuoustime Markov chains over these basis functions and as a result continuously maps between distributions. The number of parameters in DDD scales by the square of the number of basis functions; we reformulate the problem and restrict the method to compact basis functions which leads to the inference of sparse matrices only -- hence reducing the number of parameters. Finally, we demonstrate how DDD is suitable to integrate both trajectory time series (paired between subsequent time points) and snapshot time series (unpaired time points). Methods capable of integrating both scenarios are particularly relevant for the analysis of biomedical data, whereby studies observe population at fixed time points (snapshots) and individual patient journeys with repeated follow ups (trajectories).