Evaluating Hierarchical Clinical Document Classification Using Reasoning-Based LLMs

Mustafa, Akram, Naseem, Usman, Azghadi, Mostafa Rahimi

Jul-8-2025–arXiv.org Artificial Intelligence

Background: Clinical coding, particularly the classification of hierarchical ICD-10 codes from unstructured discharge summaries, is essential for healthcare operations, but remains a labor-intensive and error-prone task. Automated approaches using Large Language Models (LLMs) offer the potential to augment or replace human coders, yet their reliability and reasoning capabilities, which is needed to ensure accurate, explainable code assignments, are not well understood. Objective: This study aims to benchmark a diverse set of LLMs, both reasoning and non-reasoning models, on their ability to classify hierarchical ICD-10 codes from discharge summaries and evaluate the effect of structured reasoning on model performance. Methods: Using the MIMIC-IV dataset, the study selected 1,500 discharge summaries labeled with the top 10 most frequent ICD-10 codes, balancing dataset size with the high computational and financial cost of using LLMs. We first preprocessed the data to extract clinically relevant tokens before feeding it to the LLMs. Specifically, we used cTAKES, a clinical NLP tool, to identify medical concepts. Each summary was encoded and submitted to 11 LLMs using a standardized, structured prompt simulating a clinical coder. Models were evaluated using the F1 score across three ICD-10 levels for both primary and all diagnoses classification tasks. Reasoning models on average outperformed non-reasoning models. The Gemini 2.5 Pro model demonstrated the highest performance across tasks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jul-8-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - New Finding (0.93)
  - Experimental Study (0.68)

Industry:
- Health & Medicine
  - Therapeutic Area (1.00)
  - Diagnostic Medicine (1.00)
  - Health Care Technology > Medical Record (0.93)
  - Health Care Providers & Services (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found