Synthetic Data from Diffusion Models Improve Drug Discovery Prediction

Hu, Bing, Saragadam, Ashish, Layton, Anita, Chen, Helen

May-6-2024–arXiv.org Artificial Intelligence

There is a growing trend towards leveraging artificial intelligence (AI) in every stage of drug development [1]. Drug development is an expensive process: it costs $2-3 billion dollars and 13-15 years to bring a single drug to market. Drug discovery AI, by enabling the high-throughput screening (HTS) of ligand candidates, is geared to reduce the developmental costs of drugs by revolutionizing how ligands are designed and tested [2]. Drug development AI has found great initial success such as in poly-pharmacy [3], drug re-purposing [4, 5], drug-target interaction [6], drug response prediction [7], and in search of new antibiotics [8]. Equally important to advances in AI for drug discovery are the equal improvements in available public data for training and testing these models [9, 10, 11]. Only through equal strides in the development and refinement of drug discovery data, and the application of advanced AI models to that data, do breakthroughs happen for AI-based methods for drug discovery. Huang et al. [9] noted 3 key challenges for drug discovery data to attracting ML scientists to therapeutics: (1) a lack of AI-ready datasets and standardized knowledge representations; (2) datasets scattered around the bio-repositories without curation; (3) a lack of data focused for rare diseases and novel drugs in development. We posit another data challenge that slows the advancement of drug discovery AI: datasets are often collected independently, often with little overlap, creating data sparsity. Data sparsity poses difficulties for researchers looking to answer research questions requiring data values posed across multiple different datasets.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

May-6-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (0.35)
  - New Finding (0.49)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area > Infections and Infectious Diseases (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Representation & Reasoning (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found