van der Sloot, Almer
RGFN: Synthesizable Molecular Generation Using GFlowNets
Koziarski, Michał, Rekesh, Andrei, Shevchuk, Dmytro, van der Sloot, Almer, Gaiński, Piotr, Bengio, Yoshua, Liu, Cheng-Hao, Tyers, Mike, Batey, Robert A.
Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
Towards DNA-Encoded Library Generation with GFlowNets
Koziarski, Michał, Abukalam, Mohammed, Shah, Vedant, Vaillancourt, Louis, Schuetz, Doris Alexandra, Jain, Moksh, van der Sloot, Almer, Bourgey, Mathieu, Marinier, Anne, Bengio, Yoshua
DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates.
RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro
Bertin, Paul, Rector-Brooks, Jarrid, Sharma, Deepak, Gaudelet, Thomas, Anighoro, Andrew, Gross, Torsten, Martinez-Pena, Francisco, Tang, Eileen L., S, Suraj M, Regep, Cristian, Hayter, Jeremy, Korablyov, Maksym, Valiante, Nicholas, van der Sloot, Almer, Tyers, Mike, Roberts, Charles, Bronstein, Michael M., Lairson, Luke L., Taylor-King, Jake P., Bengio, Yoshua
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not necessarily generalise out of distribution. We employ a sequential model optimization search utilising a deep learning model to quickly discover synergistic drug combinations active against a cancer cell line, requiring substantially less screening than an exhaustive evaluation. Our small scale wet lab experiments only account for evaluation of ~5% of the total search space. After only 3 rounds of ML-guided in vitro experimentation (including a calibration round), we find that the set of drug pairs queried is enriched for highly synergistic combinations; two additional rounds of ML-guided experiments were performed to ensure reproducibility of trends. Remarkably, we rediscover drug combinations later confirmed to be under study within clinical trials. Moreover, we find that drug embeddings generated using only structural information begin to reflect mechanisms of action. Prior in silico benchmarking suggests we can enrich search queries by a factor of ~5-10x for highly synergistic drug combinations by using sequential rounds of evaluation when compared to random selection, or by a factor of >3x when using a pretrained model selecting all drug combinations at a single time point.