Multi-Modal Foundation Models for Computational Pathology: A Survey

Li, Dong, Wan, Guihong, Wu, Xintao, Wu, Xinyu, Chen, Xiaohui, He, Yi, Lian, Christine G., Sorger, Peter K., Semenov, Yevgeniy R., Zhao, Chen

Mar-20-2025–arXiv.org Artificial Intelligence

Foundation models have emerged as a powerful paradigm in computational pathology (CPath), enabling scalable and generalizable analysis of histopathological images. While early developments centered on uni-modal models trained solely on visual data, recent advances have highlighted the promise of multi-modal foundation models that integrate heterogeneous data sources such as textual reports, structured domain knowledge, and molecular profiles. In this survey, we provide a comprehensive and up-to-date review of multi-modal foundation models in CPath, with a particular focus on models built upon hematoxylin and eosin (H&E) stained whole slide images (WSIs) and tile-level representations. We categorize 32 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression. We further divide vision-language models into non-LLM-based and LLM-based approaches. Additionally, we analyze 28 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs. Our survey also presents a taxonomy of downstream tasks, highlights training and evaluation strategies, and identifies key challenges and future directions. We aim for this survey to serve as a valuable resource for researchers and practitioners working at the intersection of pathology and AI.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Mar-20-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Virginia > Williamsburg (0.04)
  - Texas > McLennan County
    - Waco (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - Arkansas > Washington County
    - Fayetteville (0.04)
- Europe > Germany
  - Bavaria > Upper Bavaria > Munich (0.04)
- Asia
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - China > Guangxi Province
    - Nanning (0.04)

Genre:
- Overview (1.00)

Industry:
- Health & Medicine
  - Diagnostic Medicine (1.00)
  - Therapeutic Area > Oncology (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found