Evaluating Large Language Models on the Spanish Medical Intern Resident (MIR) Examination 2024/2025:A Comparative Analysis of Clinical Reasoning and Knowledge Application

Vera, Carlos Luengo, Picon, Ignacio Ferro, Nunez, M. Teresa del Val, Gandia, Jose Andres Gomez, Ancillo, Antonio de Lucas, Arroyo, Victor Ramos, Figueredo, Carlos Milan

arXiv.org Artificial Intelligence 

The MIR serves as a critical selection mechanism for medical graduates entering specialized training in Spain. A study is to be conducted on the ability of generative AI models to meet the challenges presented by MIR, with emphasis on clinical reasoning, image interpretation and epidemiological calculations. This research evaluates LLM performance in complex clinical scenarios and explores the extent to which LLMs demonstrate medical reasoning beyond mere information recall. Findings The results reveal key insights into the performance of 22 LLMs on the MIR 2024 and 2025 exams. The exam features 210 multiple-choice questions covering diverse medical domains and incorporates case-based scenarios, image interpretation (25 questions), and laboratory data analysis.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found