carcinoma
MAPLE: Multi-scale Attribute-enhanced Prompt Learning for Few-shot Whole Slide Image Classification
Prompt learning has emerged as a promising paradigm for adapting pre-trained vision-language models (VLMs) to few-shot whole slide image (WSI) classification by aligning visual features with textual representations, thereby reducing annotation cost and enhancing model generalization. Nevertheless, existing methods typically rely on slide-level prompts and fail to capture the subtype-specific phenotypic variations of histological entities (e.g., nuclei, glands) that are critical for cancer diagnosis. To address this gap, we propose Multi-scale Attribute-enhanced Prompt Learning (MAPLE), a hierarchical framework for few-shot WSI classification that jointly integrates multi-scale visual semantics and performs prediction at both the entity and slide levels. Specifically, we first leverage large language models (LLMs) to generate entity-level prompts that can help identify multi-scale histological entities and their phenotypic attributes, as well as slide-level prompts to capture global visual descriptions. Then, an entity-guided cross-attention module is proposed to generate entity-level features, followed by aligning with their corresponding subtype-specific attributes for fine-grained entity-level prediction. To enrich entity representations, we further develop a cross-scale entity graph learning module that can update these representations by capturing their semantic correlations within and across scales. The refined representations are then aggregated into a slide-level representation and aligned with the corresponding prompts for slide-level prediction. Finally, we combine both entity-level and slide-level outputs to produce the final prediction results. Results on three cancer cohorts confirm the effectiveness of our approach in addressing few-shot pathology diagnosis tasks.
Assessing the Feasibility of Early Cancer Detection Using Routine Laboratory Data: An Evaluation of Machine Learning Approaches on an Imbalanced Dataset
The development of accessible screening tools for early cancer detection in dogs represents a significant challenge in veterinary medicine. Routine laboratory data offer a promising, low-cost source for such tools, but their utility is hampered by the non-specificity of individual biomarkers and the severe class imbalance inherent in screening populations. This study assesses the feasibility of cancer risk classification using the Golden Retriever Lifetime Study (GRLS) cohort under real-world constraints, including the grouping of diverse cancer types and the inclusion of post-diagnosis samples. A comprehensive benchmark evaluation was conducted, systematically comparing 126 analytical pipelines that comprised various machine learning models, feature selection methods, and data balancing techniques. Data were partitioned at the patient level to prevent leakage. The optimal model, a Logistic Regression classifier with class weighting and recursive feature elimination, demonstrated moderate ranking ability (AUROC = 0.815; 95% CI: 0.793-0.836) but poor clinical classification performance (F1-score = 0.25, Positive Predictive Value = 0.15). While a high Negative Predictive Value (0.98) was achieved, insufficient recall (0.79) precludes its use as a reliable rule-out test. Interpretability analysis with SHapley Additive exPlanations (SHAP) revealed that predictions were driven by non-specific features like age and markers of inflammation and anemia. It is concluded that while a statistically detectable cancer signal exists in routine lab data, it is too weak and confounded for clinically reliable discrimination from normal aging or other inflammatory conditions. This work establishes a critical performance ceiling for this data modality in isolation and underscores that meaningful progress in computational veterinary oncology will require integration of multi-modal data sources.
PathReasoning: A multimodal reasoning agent for query-based ROI navigation on whole-slide images
Zhang, Kunpeng, Xu, Hanwen, Wang, Sheng
Deciphering tumor microenvironment from Whole Slide Images (WSIs) is intriguing as it is key to cancer diagnosis, prognosis and treatment response. While these gigapixel images on one hand offer a comprehensive portrait of cancer, on the other hand, the extremely large size, as much as more than 10 billion pixels, make it challenging and time-consuming to navigate to corresponding regions to support diverse clinical inspection. Inspired by pathologists who conducted navigation on WSIs with a combination of sampling, reasoning and self-reflection, we proposed "PathReasoning", a multi-modal reasoning agent that iteratively navigates across WSIs through multiple rounds of reasoning and refinements. Specifically, starting with randomly sampled candidate regions, PathReasoning reviews current selections with self-reflection, reasoning over the correspondence between visual observations and clinical questions, and concludes by proposing new regions to explore. Across rounds, PathReasoning builds a reasoning chain that gradually directs attention to diagnostically relevant areas. PathReasoning turns each whole slide into a sequence of question-guided views, allowing the model to efficiently find informative ROIs within a fixed number of steps, without the need for dense pixel-level annotations. PathReasoning can substantially outperform strong ROI-selection approaches by 6.7% and 3.1% of AUROC on subtyping and longitudinal analysis tasks. The high-quality ROIs further support accurate report generation on breast cancer, significantly outperforming the standard GPT-4o by 10% in accuracy. PathReasoning prioritizes question-specific regions and constructs interpretable reasoning chains, supporting efficient slide review, consistent diagnostic interpretations, comprehensive reporting, and evidence traceability in digital pathology.
Pillar-0: A New Frontier for Radiology Foundation Models
Agrawal, Kumar Krishna, Liu, Longchao, Lian, Long, Nercessian, Michael, Harguindeguy, Natalia, Wu, Yufu, Mikhael, Peter, Lin, Gigin, Sequist, Lecia V., Fintelmann, Florian, Darrell, Trevor, Bai, Yutong, Chung, Maggie, Yala, Adam
Radiology plays an integral role in modern medicine, yet rising imaging volumes have far outpaced workforce growth. Foundation models offer a path toward assisting with the full spectrum of radiology tasks, but existing medical models remain limited: they process volumetric CT and MRI as low-fidelity 2D slices, discard critical grayscale contrast information, and lack evaluation frameworks that reflect real clinical practice. We introduce Pillar-0, a radiology foundation model pretrained on 42,990 abdomen-pelvis CTs, 86,411 chest CTs, 14,348 head CTs, and 11,543 breast MRIs from a large academic center, together with RATE, a scalable framework that extracts structured labels for 366 radiologic findings with near-perfect accuracy using LLMs. Across internal test sets of 14,230 abdomen-pelvis CTs, 10,646 chest CTs, 4,906 head CTs, and 1,585 breast MRIs, Pillar-0 establishes a new performance frontier, achieving mean AUROCs of 86.4, 88.0, 90.1, and 82.9, outperforming MedGemma (Google), MedImageInsight (Microsoft), Lingshu (Alibaba), and Merlin (Stanford) by 7.8-15.8 AUROC points and ranking best in 87.2\% (319/366) tasks. Pillar-0 similarly outperforms all baselines in an external validation on the Stanford Abdominal CT dataset, including Merlin (82.2 vs 80.6 AUROC). Pillar-0 extends to tasks beyond its pretraining, such as long-horizon lung cancer risk prediction, where it improves upon the state-of-the-art Sybil by 3.0 C-index points on NLST, and generalizes with gains of 5.9 (MGH) and 1.9 (CGMH). In brain hemorrhage detection, Pillar-0 obtained a >95 AUROC when using only 1/20th of the data of the next most sample efficient baseline. Pillar-0 and RATE together provide an open, clinically rigorous foundation for building high-performance radiology systems, enabling applications that were previously infeasible due to computational, data, and evaluation constraints.
PRISM2: Unlocking Multi-Modal General Pathology AI with Clinical Dialogue
Vorontsov, Eugene, Shaikovski, George, Casson, Adam, Viret, Julian, Zimmermann, Eric, Tenenholtz, Neil, Wang, Yi Kan, Bernhard, Jan H., Godrich, Ran A., Retamero, Juan A., Shia, Jinru, Gonen, Mithat, Weiser, Martin R., Klimstra, David S., Yousfi, Razik, Fusi, Nicolo, Fuchs, Thomas J., Severson, Kristen, Liu, Siqi
Recent rapid progress in the field of computational pathology has been enabled by foundation models. These models are beginning to move beyond encoding image patches towards whole-slide understanding but their clinical utility remains limited. In this work, we present PRISM2, a multimodal slide-level foundation model trained on data from 700,000 diagnostic specimen-report pairs, the largest vision (2.3 million whole slide images) and language (14M question-answer pairs) histopathology dataset to date. By learning through clinical-dialogue supervision, PRISM2 aligns histomorphologic features with the language of diagnostic reasoning, producing slide-level representations that support both direct diagnostic question-answering and transferable embeddings for downstream tasks. Without additional training, PRISM2 matches or exceeds the cancer-detection performance of clinical-grade products. This is observed without loss of generality on other tasks, where PRISM2 achieves top performance. Finally, using survival prediction as the example, we show that task-specific finetuning with a large dataset can outperform task-specific models, further improving performance. These results demonstrate how language-supervised pretraining provides a scalable, clinically grounded signal for learning generalizable pathology representations, bridging human diagnostic reasoning and foundation-model performance.
Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
Bourgade, Raphaรซl, Balezo, Guillaume, Feki, Hana, Monier, Lily, Blons, Matthieu, Blondel, Alice, Loussouarn, Delphine, Vincent-Salomon, Anne, Walter, Thomas
Detecting mitotic figures (MFs) in histopathology images remains a challenging task. Their quantification traditionally relies on the manual identification of "hot spots" by pathologists, followed by visual counting--an approach that is inherently subjective and may not reliably reflect the true prolifer-ative activity of a tumor. With the rise of digital pathology and artificial intelligence, numerous efforts have been made to automate mitosis detection in order to enhance accuracy, reproducibility, and scalability. Among these, the MItosis DOmain Generalization (MIDOG) challenges have emerged as a key benchmark for evaluating the generalizability of detection algorithms under realistic domain shifts. The 2021 edition (1) addressed scanner-induced variability using breast cancer WSIs, while the 2022 edition (2) extended the scope to include multiple tissue types and species, introducing further biological diversity. The 2025 MIDOG challenge (3) builds on these foundations with the most comprehensive mitosis-annotated dataset to date, and introduces two tasks: (1) detecting mitotic figures in arbitrary tumor tissue, and (2) determining whether a mitotic figure is atypical or normal. These tasks represent a significant step toward developing robust mitosis detection systems that generalize across diverse and complex histological conditions. In this work, we present a high-performance detection pipeline based on the YOLOv12 object detection architecture.
Generalisation of automatic tumour segmentation in histopathological whole-slide images across multiple cancer types
Skrede, Ole-Johan, Pradhan, Manohar, Isaksen, Maria Xepapadakis, Hveem, Tarjei Sveinsgjerd, Vlatkovic, Ljiljana, Nesbakken, Arild, Lindemann, Kristina, Kristensen, Gunnar B, Kasius, Jenneke, Zeimet, Alain G, Brustugun, Odd Terje, Busund, Lill-Tove Rasmussen, Richardsen, Elin H, Haug, Erik Skaaheim, Brennhovd, Bjรธrn, Rewcastle, Emma, Lillesand, Melinda, Kvikstad, Vebjรธrn, Janssen, Emiel, Kerr, David J, Liestรธl, Knut, Albregtsen, Fritz, Kleppe, Andreas
Deep learning is expected to aid pathologists by automating tasks such as tumour segmentation. We aimed to develop one universal tumour segmentation model for histopathological images and examine its performance in different cancer types. The model was developed using over 20 000 whole-slide images from over 4 000 patients with colorectal, endometrial, lung, or prostate carcinoma. Performance was validated in pre-planned analyses on external cohorts with over 3 000 patients across six cancer types. Exploratory analyses included over 1 500 additional patients from The Cancer Genome Atlas. Average Dice coefficient was over 80% in all validation cohorts with en bloc resection specimens and in The Cancer Genome Atlas cohorts. No loss of performance was observed when comparing the universal model with models specialised on single cancer types. In conclusion, extensive and rigorous evaluations demonstrate that generic tumour segmentation by a single model is possible across cancer types, patient populations, sample preparations, and slide scanners.
GAS-MIL: Group-Aggregative Selection Multi-Instance Learning for Ensemble of Foundation Models in Digital Pathology Image Analysis
Quan, Peiran, Gu, Zifan, Zhao, Zhuo, Zhou, Qin, Yang, Donghan M., Rong, Ruichen, Xie, Yang, Xiao, Guanghua
Foundation models (FMs) have transformed computational pathology by providing powerful, general - purpose feature extractors. However, adapting and benchmarking individual FMs for specific diagnostic tasks is often time - consuming and resource - intensive, espe cially given their scale and diversity. To address this challenge, we introduce Group - Aggregative Selection Multi - Instance Learning (GAS - MIL), a flexible ensemble framework that seamlessly integrates features from multiple FMs, preserving their complementa ry strengths without requiring manual feature selection or extensive task - specific fine - tuning. Across classification tasks in three cancer datasets -- prostate (PANDA), ovarian (UBC - OCEAN), and breast (TCGA - BrCa) -- GAS - MIL consistently achieves superior or on - par performance relative to individual FMs and established MIL methods, demonstrating its robustness and generalizability. By enabling efficient int egration of heterogeneous FMs, GAS - MIL streamlines model deployment for pathology and provides a scalable foundation for future multimodal and precision oncology applications.