The NordDRG AI Benchmark for Large Language Models
–arXiv.org Artificial Intelligence
Large language models (LLMs) are being piloted for clinical coding and decision support, yet no open benchmark targets the hospital-funding layer where Diagnosis-Related Groups (DRGs) determine reimbursement. In most OECD systems, DRGs route a substantial share of multi-trillion-dollar health spending through governed grouper software, making transparency and auditability first-order concerns. We release NordDRG-AI-Benchmark, the first public, rule-complete test bed for DRG reasoning. The package includes (i) machine-readable approximately 20-sheet NordDRG definition tables and (ii) expert manuals and change-log templates that capture governance workflows. It exposes two suites: a 13-task Logic benchmark (code lookup, cross-table inference, grouping features, multilingual terminology, and CC/MCC validity checks) and a 13-task Grouper benchmark that requires full DRG grouper emulation with strict exact-match scoring on both the DRG and the triggering drg_logic.id. Lightweight reference agents (LogicAgent, GrouperAgent) enable artefact-only evaluation. Under an artefact-only (no web) setting, on the 13 Logic tasks GPT-5 Thinking and Opus 4.1 score 13/13, o3 scores 12/13; mid-tier models (GPT-5 Thinking Mini, o4-mini, GPT-5 Fast) achieve 6-8/13, and remaining models score 5/13 or below. On full grouper emulation across 13 tasks, GPT-5 Thinking solves 7/13, o3 6/13, o4-mini 3/13; GPT-5 Thinking Mini solves 1/13, and all other tested endpoints score 0/13. To our knowledge, this is the first public report of an LLM partially emulating the complete NordDRG grouper logic with governance-grade traceability. Coupling a rule-complete release with exact-match tasks and open scoring provides a reproducible yardstick for head-to-head and longitudinal evaluation in hospital funding. Benchmark materials available in Github.
arXiv.org Artificial Intelligence
Aug-21-2025
- Country:
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- Finland (0.05)
- Slovenia > Upper Carniola
- Municipality of Bled > Bled (0.04)
- Denmark > Capital Region
- North America > United States (0.28)
- Oceania > Australia (0.04)
- Europe
- Genre:
- Research Report (1.00)
- Industry:
- Technology: