PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation

Ahmed, Faruk, Yang, Lin, Jaroensri, Tiam, Sellergren, Andrew, Matias, Yossi, Hassidim, Avinatan, Corrado, Greg S., Webster, Dale R., Shetty, Shravya, Prabhakara, Shruthi, Liu, Yun, Golden, Daniel, Wulczyn, Ellery, Steiner, David F.

arXiv.org Artificial Intelligence 

Recent applications of vision-language modeling in digital histopathology have been predominantly designed to generate text describing individual regions of interest extracted from a single digitized histopathology image, or Whole Slide Image (WSI). An emerging line of research approaches the more practical clinical use case of slide-level text generation (Ahmed et al., 2024, Chen et al., 2024). However, in the typical clinical use case, there can be multiple biological tissue parts associated with a case, with each part having multiple slides. Pathologists write up a report summarizing their part-level diagnostic findings by microscopically reviewing each of the available slides per part and integrating information across these slides. This many-to-one relationship of slides to clinical descriptions is a recognized challenge for vision-language modeling in this space (Ahmed et al., 2024). The common approach taken in recent literature is to restrict modeling and analysis to single-slide cases or to manually identify a single slide within a case or part that is most representative of the clinical findings in reports (Ahmed et al., 2024, Chen et al., 2024, Guo et al., 2024, Shaikovski et al., 2024, Xu et al., 2024, Zhou et al., 2024). This strategy of selecting representative slides was also adopted in constructing one of the most widely used histopathology datasets, TCGA (Cooper et al., 2018).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found