MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images

Meseguer, Pablo, del Amor, Rocío, Naranjo, Valery

Oct-21-2024–arXiv.org Artificial Intelligence

Vision-language supervision has made remarkable strides in learning visual representations from textual guidance. In digital pathology, vision-language models (VLM), pre-trained on curated datasets of histological image-captions, have been adapted to downstream tasks, such as region of interest classification. Zero-shot transfer for slide-level prediction has been formulated by MI-Zero [1], but it exhibits high variability depending on the textual prompts. Inspired by prototypical learning, we propose MI-VisionShot, a training-free adaptation method on top of VLMs to predict slide-level labels in few-shot learning scenarios. Our framework takes advantage of the excellent representation learning of VLM to create prototype-based classifiers under a multipleinstance setting by retrieving the most discriminative patches within each slide. Experimentation through different settings shows the ability of MI-VisionShot to surpass zero-shot transfer with lower variability, even in low-shot scenarios.

classification, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Oct-21-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Spain (0.29)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine
  - Diagnostic Medicine (0.70)
  - Therapeutic Area > Oncology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.72)
  - Vision (1.00)