TCM-5CEval: Extended Deep Evaluation Benchmark for LLM's Comprehensive Clinical Research Competence in Traditional Chinese Medicine
Huang, Tianai, Chen, Jiayuan, Lu, Lu, Chen, Pengcheng, Li, Tianbin, Han, Bing, Tang, Wenchao, Xu, Jie, Li, Ming
–arXiv.org Artificial Intelligence
Large language models (LLMs) have demonstrated exceptional capabilities in general domains, yet their application in highly specialized and culturally-rich fields like Traditional Chinese Medicine (TCM) requires rigorous and nuanced evaluation. Building upon prior foundational work such as TCM-3CEval, which highlighted systemic knowledge gaps and the importance of cultural-contextual alignment, we introduce TCM-5CEval, a more granular and comprehensive benchmark. TCM-5CEval is designed to assess LLMs across five critical dimensions: (1) Core Knowledge (TCM-Exam), (2) Classical Literacy (TCM-LitQA), (3) Clinical Decision-making (TCM-MRCD), (4) Chinese Materia Medica (TCM-CMM), and (5) Clinical Non-pharmacological Therapy (TCM-ClinNPT). We conducted a thorough evaluation of fifteen prominent LLMs, revealing significant performance disparities and identifying top-performing models like deepseek\_r1 and gemini\_2\_5\_pro. Our findings show that while models exhibit proficiency in recalling foundational knowledge, they struggle with the interpretative complexities of classical texts. Critically, permutation-based consistency testing reveals widespread fragilities in model inference. All evaluated models, including the highest-scoring ones, displayed a substantial performance degradation when faced with varied question option ordering, indicating a pervasive sensitivity to positional bias and a lack of robust understanding. TCM-5CEval not only provides a more detailed diagnostic tool for LLM capabilities in TCM but aldso exposes fundamental weaknesses in their reasoning stability. To promote further research and standardized comparison, TCM-5CEval has been uploaded to the Medbench platform, joining its predecessor in the "In-depth Challenge for Comprehensive TCM Abilities" special track.
arXiv.org Artificial Intelligence
Nov-18-2025
- Country:
- Asia > China
- North America
- Canada > Alberta
- Census Division No. 5
- Kneehill County (0.24)
- Starland County (0.24)
- Census Division No. 7 > Stettler County No. 6 (0.24)
- Census Division No. 8 > Red Deer County (0.24)
- Census Division No. 5
- United States > Washington
- King County > Seattle (0.14)
- Canada > Alberta
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine > Therapeutic Area (1.00)
- Technology: