Model selection meets clinical semantics: Optimizing ICD-10-CM prediction via LLM-as-Judge evaluation, redundancy-aware sampling, and section-aware fine-tuning

Dai, Hong-Jie, Li, Zheng-Hao, Lu, An-Tai, Shain, Bo-Tsz, Li, Ming-Ta, Mir, Tatheer Hussain, Wang, Kuang-Te, Su, Min-I, Liu, Pei-Kang, Tsai, Ming-Ju

Sep-24-2025–arXiv.org Artificial Intelligence

Accurate International Classification of Diseases (ICD) coding is critical for clinical documentation, billing, and healthcare analytics, yet it remains a labour-intensive and error-prone task. Although large language models (LLMs) show promise in automating ICD coding, their challenges in base model selection, input contextualization, and training data redundancy limit their effectiveness. We propose a modular framework for ICD-10 Clinical Modification (ICD-10-CM) code prediction that addresses these challenges through principled model selection, redundancy-aware data sampling, and structured input design. The framework integrates an LLM-as-judge evaluation protocol with Plackett-Luce aggregation to assess and rank open-source LLMs based on their intrinsic comprehension of ICD-10-CM code definitions. We introduced embedding-based similarity measures, a redundancy-aware sampling strategy to remove semantically duplicated discharge summaries. We leverage structured discharge summaries from Taiwanese hospitals to evaluate contextual effects and examine section-wise content inclusion under universal and section-specific modelling paradigms. Experiments across two institutional datasets demonstrate that the selected base model after fine-tuning consistently outperforms baseline LLMs in internal and external evaluations. Incorporating more clinical sections consistently improves prediction performance. This study uses open-source LLMs to establish a practical and principled approach to ICD-10-CM code prediction. The proposed framework provides a scalable, institution-ready solution for real-world deployment of automated medical coding systems by combining informed model selection, efficient data refinement, and context-aware prompting.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Sep-24-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Taiwan (0.31)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine
  - Health Care Providers & Services (1.00)
  - Health Care Technology > Medical Record (0.68)
  - Therapeutic Area > Oncology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found