Performance of Large Language Models in Answering Critical Care Medicine Questions

Alwakeel, Mahmoud, Nagori, Aditya, Wong, An-Kwok Ian, Chaisson, Neal, Krishnamoorthy, Vijay, Kamaleswaran, Rishikesan

Sep-25-2025–arXiv.org Artificial Intelligence

Abstract: Large Language Models have been tested on medical student-level questions, but their performance in specialized fields like Critical Care Medicine (CCM) is less explored. This study evaluated Meta-Llama 3.1 models (8B and 70B parameters) on 871 CCM questions. Performance varied across domains, highest in Research (68.4%) and lowest in Renal (47.9%), highlighting the need for broader future work to improve models across various subspecialty domains. Introduction: The use of Large Language Models (LLMs) to answer medical exam - style questions has gained popularity in recent years. This study aims to evaluate the performance of LLMs in answering subspecialty CCM board exam - style questions.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Sep-25-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Ohio (0.16)
  - North Carolina (0.16)

Genre:
- Research Report (0.95)

Industry:
- Health & Medicine > Therapeutic Area (0.72)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.87)