Meseguer, Pablo
Exploring visual language models as a powerful tool in the diagnosis of Ewing Sarcoma
Pastor-Naranjo, Alvaro, Meseguer, Pablo, del Amor, Rocío, Lopez-Guerrero, Jose Antonio, Navarro, Samuel, Scotlandi, Katia, Llombart-Bosch, Antonio, Machado, Isidro, Naranjo, Valery
Ewing's sarcoma (ES), characterized by a high density of small round blue cells without structural organization, presents a significant health concern, particularly among adolescents aged 10 to 19. Artificial intelligence-based systems for automated analysis of histopathological images are promising to contribute to an accurate diagnosis of ES. In this context, this study explores the feature extraction ability of different pre-training strategies for distinguishing ES from other soft tissue or bone sarcomas with similar morphology in digitized tissue microarrays for the first time, as far as we know. Vision-language supervision (VLS) is compared to fully-supervised ImageNet pre-training within a multiple instance learning paradigm. Our findings indicate a substantial improvement in diagnostic accuracy with the adaption of VLS using an in-domain dataset. Notably, these models not only enhance the accuracy of predicted classes but also drastically reduce the number of trainable parameters and computational costs.
Enhancing Whole Slide Image Classification through Supervised Contrastive Domain Adaptation
Carretero, Ilán, Meseguer, Pablo, del Amor, Rocío, Naranjo, Valery
Domain shift in the field of histopathological imaging is a common phenomenon due to the intra- and inter-hospital variability of staining and digitization protocols. The implementation of robust models, capable of creating generalized domains, represents a need to be solved. In this work, a new domain adaptation method to deal with the variability between histopathological images from multiple centers is presented. In particular, our method adds a training constraint to the supervised contrastive learning approach to achieve domain adaptation and improve inter-class separability. Experiments performed on domain adaptation and classification of whole-slide images of six skin cancer subtypes from two centers demonstrate the method's usefulness. The results reflect superior performance compared to not using domain adaptation after feature extraction or staining normalization.
MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images
Meseguer, Pablo, del Amor, Rocío, Naranjo, Valery
Vision-language supervision has made remarkable strides in learning visual representations from textual guidance. In digital pathology, vision-language models (VLM), pre-trained on curated datasets of histological image-captions, have been adapted to downstream tasks, such as region of interest classification. Zero-shot transfer for slide-level prediction has been formulated by MI-Zero [1], but it exhibits high variability depending on the textual prompts. Inspired by prototypical learning, we propose MI-VisionShot, a training-free adaptation method on top of VLMs to predict slide-level labels in few-shot learning scenarios. Our framework takes advantage of the excellent representation learning of VLM to create prototype-based classifiers under a multipleinstance setting by retrieving the most discriminative patches within each slide. Experimentation through different settings shows the ability of MI-VisionShot to surpass zero-shot transfer with lower variability, even in low-shot scenarios.