Multi-Modal Foundation Models for Computational Pathology: A Survey
Li, Dong, Wan, Guihong, Wu, Xintao, Wu, Xinyu, Chen, Xiaohui, He, Yi, Lian, Christine G., Sorger, Peter K., Semenov, Yevgeniy R., Zhao, Chen
–arXiv.org Artificial Intelligence
Foundation models have emerged as a powerful paradigm in computational pathology (CPath), enabling scalable and generalizable analysis of histopathological images. While early developments centered on uni-modal models trained solely on visual data, recent advances have highlighted the promise of multi-modal foundation models that integrate heterogeneous data sources such as textual reports, structured domain knowledge, and molecular profiles. In this survey, we provide a comprehensive and up-to-date review of multi-modal foundation models in CPath, with a particular focus on models built upon hematoxylin and eosin (H&E) stained whole slide images (WSIs) and tile-level representations. We categorize 32 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression. We further divide vision-language models into non-LLM-based and LLM-based approaches. Additionally, we analyze 28 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs. Our survey also presents a taxonomy of downstream tasks, highlights training and evaluation strategies, and identifies key challenges and future directions. We aim for this survey to serve as a valuable resource for researchers and practitioners working at the intersection of pathology and AI.
arXiv.org Artificial Intelligence
Mar-20-2025
- Country:
- North America > United States > Arkansas (0.14)
- Genre:
- Overview (1.00)
- Industry:
- Health & Medicine
- Diagnostic Medicine (1.00)
- Therapeutic Area > Oncology (0.93)
- Health & Medicine
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (1.00)
- Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence