Expanding Relevance Judgments for Medical Case-based Retrieval Task with Multimodal LLMs
Pires, Catarina, Nunes, Sérgio, Teixeira, Luís Filipe
–arXiv.org Artificial Intelligence
Evaluating Information Retrieval (IR) systems relies on high-quality manual relevance judgments (qrels), which are costly and time-consuming to obtain. While pooling reduces the annotation effort, it results in only partially labeled datasets. Large Language Models (LLMs) offer a promising alternative to reducing reliance on manual judgments, particularly in complex domains like medical case-based retrieval, where relevance assessment requires analyzing both textual and visual information. In this work, we explore using a Multimodal Large Language Model (MLLM) to expand relevance judgments, creating a new dataset of automated judgments. Specifically, we employ Gemini 1.5 Pro on the ImageCLEFmed 2013 case-based retrieval task, simulating human assessment through an iteratively refined, structured prompting strategy that integrates binary scoring, instruction-based evaluation, and few-shot learning. We systematically experimented with various prompt configurations to maximize agreement with human judgments. To evaluate agreement between the MLLM and human judgments, we use Cohen's Kappa, achieving a substantial agreement score of 0.6, comparable to inter-annotator agreement typically observed in multimodal retrieval tasks. Starting from the original 15,028 manual judgments (4.72% relevant) across 35 topics, our MLLM-based approach expanded the dataset by over 37x to 558,653 judgments, increasing relevant annotations to 5,950. On average, each medical case query received 15,398 new annotations, with approximately 99% being non-relevant, reflecting the high sparsity typical in this domain. Our results demonstrate the potential of MLLMs to scale relevance judgment collection, offering a promising direction for supporting retrieval evaluation in medical and multimodal IR tasks.
arXiv.org Artificial Intelligence
Jun-24-2025
- Country:
- Asia
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Japan > Honshū
- Europe
- Switzerland > Geneva
- Geneva (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Greece > Ionian Islands
- Corfu (0.04)
- Italy (0.05)
- United Kingdom > England
- South Yorkshire > Sheffield (0.04)
- Hungary > Budapest
- Budapest (0.04)
- Portugal > Porto
- Porto (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Austria > Vienna (0.14)
- Switzerland > Geneva
- North America > United States
- District of Columbia > Washington (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.86)
- Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (0.47)
- Technology: