KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

Chen, Benson, Danel, Tomasz, McEnaney, Patrick J., Jain, Nikhil, Novikov, Kirill, Akki, Spurti Umesh, Turnbull, Joshua L., Pandya, Virja Atul, Belotserkovskii, Boris P., Weaver, Jared Bryce, Biswas, Ankita, Nguyen, Dat, Dreiman, Gabriel H. S., Sultan, Mohammad, Stanley, Nathaniel, Whalen, Daniel M, Kanichar, Divya, Klein, Christoph, Fox, Emily, Watts, R. Edward

Oct-11-2024–arXiv.org Artificial Intelligence

DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to utilize such data. To bridge this gap, we present KinDEL, one of the first large, publicly available DEL datasets on two kinases: Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1). Interest in this data modality is growing due to its ability to generate extensive supervised chemical data that densely samples around select molecular structures. Demonstrating one such application of the data, we benchmark different machine learning techniques to develop predictive models for hit identification; in particular, we highlight recent structure-based probabilistic approaches. Finally, we provide biophysical assay data, both on-and off-DNA, to validate our models on a smaller subset of molecules. Data and code for our benchmarks can be found at https://github.com/insitro/kindel. DNA-Encoded Libraries (DEL) have emerged as a powerful tool in drug discovery, enabling highly efficient screens of small molecule libraries against therapeutically relevant targets (Yuen & Franzini, 2017; Gironda-Martínez et al., 2021; Kunig et al., 2021; Peterson & Liu, 2023). These massive libraries are efficiently constructed through combinatorial synthesis of chemical building blocks, or synthons, with each resulting molecule being assigned a DNA barcode (see Figure 1). DELs are then used in selection experiments against proteins of interest, wherein multiple rounds of washing are conducted to remove any weak binders, and the DNA tags of surviving molecules are sequenced as a measure of binding affinity. Despite the highly efficient throughput of DELs, data generated through these experiments are intrinsically noisy with various sources of bias arising from the DEL synthesis and selection processes, necessitating modern machine learning methods to learn signal from the data. Unfortunately, there is still a lack of large, publicly available DEL datasets and benchmarking tasks to drive this important research area.

artificial intelligence, machine learning, molecule, (17 more...)

arXiv.org Artificial Intelligence

Oct-11-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California
  - San Francisco County > San Francisco (0.14)
  - San Mateo County > South San Francisco (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area > Oncology (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found