Multi-view biomedical foundation models for molecule-target and property prediction

Suryanarayanan, Parthasarathy, Qiu, Yunguang, Sethi, Shreyans, Mahajan, Diwakar, Li, Hongyang, Yang, Yuxin, Eyigoz, Elif, Saenz, Aldo Guzman, Platt, Daniel E., Rumbell, Timothy H., Ng, Kenney, Dey, Sanjoy, Burch, Myson, Kwon, Bum Chul, Meyer, Pablo, Cheng, Feixiong, Hu, Jianying, Morrone, Joseph A.

Oct-25-2024–arXiv.org Artificial Intelligence

Drug discovery is a complex, multi-stage process. Lead identification and lead optimization remain costly with low success-rates and computational methods play an important role in accelerating these tasks [1-3]. The prediction of a broad range of chemical and biological properties of candidate molecules is an essential component of screening and assessing molecules and data-driven, machine learning approaches have long aided in this process [4-6]. Molecular representations form the basis of machine learning models [2, 7], facilitating algorithmic and scientific advances in the field. However, learning useful and generalized latent representation is a hard problem due to limited amounts of labeled data, wide ranges of downstream tasks, vast chemical space, and large heterogeneity in molecular structures. Learning latent representations using unsupervised techniques is vital for such models to scale. Large language models (LLMs) have revolutionized other fields [8] and similar sequence-based foundation models have shown promise to learn molecular representations and be trainable on many downstream property prediction tasks [9-11]. A key advantage is that the transformer based architecture can learn in a self-supervised fashion to create a "pre-trained" molecular representation. The most direct application of LLM like transformers is facilitated by a sequence, text-based representation (e.g.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-25-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report (1.00)

Industry:
- Government > Regional Government
  - North America Government > United States Government > FDA (0.47)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area
    - Immunology (0.47)
    - Infections and Infectious Diseases (0.47)
    - Neurology > Alzheimer's Disease (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.93)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found