Goto

Collaborating Authors

 Figueredo, Carlos Milan


Evaluating Large Language Models on the Spanish Medical Intern Resident (MIR) Examination 2024/2025:A Comparative Analysis of Clinical Reasoning and Knowledge Application

arXiv.org Artificial Intelligence

The MIR serves as a critical selection mechanism for medical graduates entering specialized training in Spain. A study is to be conducted on the ability of generative AI models to meet the challenges presented by MIR, with emphasis on clinical reasoning, image interpretation and epidemiological calculations. This research evaluates LLM performance in complex clinical scenarios and explores the extent to which LLMs demonstrate medical reasoning beyond mere information recall. Findings The results reveal key insights into the performance of 22 LLMs on the MIR 2024 and 2025 exams. The exam features 210 multiple-choice questions covering diverse medical domains and incorporates case-based scenarios, image interpretation (25 questions), and laboratory data analysis.