Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Jun-11-2026, 19:52:57 GMT–Neural Information Processing Systems

Diagnosis-Related Group (DRG) codes are essential for hospital reimbursement and operations but require labor-intensive assignment. Large Language Models (LLMs) struggle with DRG coding due to the out-of-distribution (OOD) nature of the task: pretraining corpora rarely contain private clinical or billing data. We introduce DRG-Sapphire, which uses large-scale reinforcement learning (RL) for automated DRG coding from clinical notes. Built on Qwen2.5-7B and trained with Group Relative Policy Optimization (GRPO) using rule-based rewards, DRG-Sapphire introduces a series of RL enhancements to address domain-specific challenges not seen in previous mathematical tasks. Our model achieves state-of-the-art accuracy on the MIMIC-IV benchmark and generates physician-validated reasoning for DRG assignments, significantly enhancing explainability.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Jun-11-2026, 19:52:57 GMT

Conferences Web Page

Add feedback

Industry:
- Health & Medicine (0.96)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)