Evaluating Large Language Models on the Spanish Medical Intern Resident (MIR) Examination 2024/2025:A Comparative Analysis of Clinical Reasoning and Knowledge Application

Vera, Carlos Luengo, Picon, Ignacio Ferro, Nunez, M. Teresa del Val, Gandia, Jose Andres Gomez, Ancillo, Antonio de Lucas, Arroyo, Victor Ramos, Figueredo, Carlos Milan

Mar-16-2025–arXiv.org Artificial Intelligence

The MIR serves as a critical selection mechanism for medical graduates entering specialized training in Spain. A study is to be conducted on the ability of generative AI models to meet the challenges presented by MIR, with emphasis on clinical reasoning, image interpretation and epidemiological calculations. This research evaluates LLM performance in complex clinical scenarios and explores the extent to which LLMs demonstrate medical reasoning beyond mere information recall. Findings The results reveal key insights into the performance of 22 LLMs on the MIR 2024 and 2025 exams. The exam features 210 multiple-choice questions covering diverse medical domains and incorporates case-based scenarios, image interpretation (25 questions), and laboratory data analysis.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-16-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Spain > Galicia > Madrid (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Diagnostic Medicine (1.00)
- Education > Educational Setting
  - Higher Education (0.31)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.50)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found