Towards Robust Foundation Models for Digital Pathology
Kömen, Jonah, de Jong, Edwin D., Hense, Julius, Marienwald, Hannah, Dippel, Jonas, Naumann, Philip, Marcus, Eric, Ruff, Lukas, Alber, Maximilian, Teuwen, Jonas, Klauschen, Frederick, Müller, Klaus-Robert
–arXiv.org Artificial Intelligence
Biomedical Foundation Models (FMs) are rapidly transforming AI-enabled healthcare research and entering clinical validation. However, their susceptibility to learning non-biological technical features -- including variations in surgical/endoscopic techniques, laboratory procedures, and scanner hardware -- poses risks for clinical deployment. We present the first systematic investigation of pathology FM robustness to non-biological features. Our work (i) introduces measures to quantify FM robustness, (ii) demonstrates the consequences of limited robustness, and (iii) proposes a framework for FM robustification to mitigate these issues. Specifically, we developed PathoROB, a robustness benchmark with three novel metrics, including the robustness index, and four datasets covering 28 biological classes from 34 medical centers. Our experiments reveal robustness deficits across all 20 evaluated FMs, and substantial robustness differences between them. We found that non-robust FM representations can cause major diagnostic downstream errors and clinical blunders that prevent safe clinical adoption. Using more robust FMs and post-hoc robustification considerably reduced (but did not yet eliminate) the risk of such errors. This work establishes that robustness evaluation is essential for validating pathology FMs before clinical adoption and demonstrates that future FM development must integrate robustness as a core design principle. PathoROB provides a blueprint for assessing robustness across biomedical domains, guiding FM improvement efforts towards more robust, representative, and clinically deployable AI systems that prioritize biological information over technical artifacts.
arXiv.org Artificial Intelligence
Jul-25-2025
- Country:
- Asia > South Korea
- Europe
- Germany
- Bavaria > Upper Bavaria
- Munich (0.04)
- Berlin (0.04)
- Bavaria > Upper Bavaria
- Netherlands > North Holland
- Amsterdam (0.04)
- Poland (0.04)
- Germany
- North America
- Canada > Alberta (0.04)
- United States
- California > San Francisco County
- San Francisco (0.14)
- North Carolina (0.04)
- California > San Francisco County
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine
- Diagnostic Medicine > Imaging (0.92)
- Health Care Providers & Services (1.00)
- Therapeutic Area > Oncology (1.00)
- Health & Medicine
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Performance Analysis > Accuracy (0.92)
- Statistical Learning (1.00)
- Natural Language > Large Language Model (0.67)
- Vision (1.00)
- Machine Learning
- Data Science > Data Mining (1.00)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology