Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies
Jiang, Luyi, Chen, Jiayuan, Lu, Lu, Peng, Xinwei, Liu, Lihao, He, Junjun, Xu, Jie
–arXiv.org Artificial Intelligence
In recent years, large language models (LLMs), empowered by massive text corpora and deep learning techniques, have demonstrated breakthrough advancements in cross-domain knowledge transfer and human-machine dialogue interactions [1]. Within the healthcare domain, LLMs are increasingly deployed across nine core application scenarios, including intelligent diagnosis, personalized treatment, and drug discovery, garnering significant attention from both academia and industry [2, 3]. A particularly important area of focus is the development and evaluation of Chinese medical LLMs, which face unique challenges due to the specialized nature of medical knowledge and the high-stakes implications of clinical decision-making. Hence, ensuring the reliability and safety of these models has become critical, necessitating rigorous evaluation frameworks [4]. Current research on medical LLMs evaluation exhibits two predominant trends. On one hand, general-domain benchmarks (e.g., HELM [5], MMLU [6]) assess foundational model capabilities through medical knowledge tests. On the other hand, specialized medical evaluation systems (e.g., MedQA [7], C-Eval-Medical [8]) emphasize clinical reasoning and ethical compliance. Notably, the MedBench framework [9], jointly developed by institutions including Shanghai AI Laboratory, has emerged as the most influential benchmark for Chinese medical LLMs. By establishing a standardized evaluation system spanning five dimensions--medical language comprehension, complex reasoning, and safety ethics--it has attracted participation from hundreds of research teams.
arXiv.org Artificial Intelligence
Mar-10-2025
- Genre:
- Research Report (0.50)
- Industry:
- Health & Medicine
- Diagnostic Medicine (1.00)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Gastroenterology (1.00)
- Hematology (0.68)
- Infections and Infectious Diseases (0.68)
- Oncology (1.00)
- Health & Medicine
- Technology: