Leveraging Group Relative Policy Optimization to Advance Large Language Models in Traditional Chinese Medicine

Xie, Jiacheng, Zeng, Shuai, Yu, Yang, Tang, Xiaoting, An, Guanghui, Xu, Dong

Oct-21-2025–arXiv.org Artificial Intelligence

Traditional Chinese Medicine (TCM) presents a rich and structurally unique knowledge system that challenges conventional applications of large language models (LLMs). Although previous TCM - specific LLMs have shown progress through supervised fine - tuning, they often face limitations in alignment, data quality, and evaluation consistency. In this study, we introduce Ladder - base, the first TCM - focused LLM trained with Group Relative Policy Optimization (GRPO), a reinforcement learning method that improves reasoning and factual consistency by optimizing response selection based on intra - group comparisons. Ladder - base is built upon the Qwen2.5 - 7B - Instruct foundation model and trained exclusively on the textual subset of the TCM - Ladder benchmark, using 80 percent of the data for training and the remaining 20 percent split evenly between validation and test sets. Through standardized evaluation, Ladder - base demonstrates superior performance across multiple reasoning metrics when compared to both state - of - the - art general - purpose LLMs such as GPT - 4, Gemini 2.5, Claude 3, and Qwen3 and domain - specific TCM models including BenTsao, HuatuoGPT2, and Zhongjing. These findings suggest that GRPO provides an effective and efficient strategy for aligning LLMs with expert - level reasoning in traditional medical domains and supports the development of trustworthy and clinically grounded TCM artificial intelligence systems.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-21-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Missouri > Boone County > Columbia (0.15)

Genre:
- Research Report > New Finding (0.69)

Industry:
- Health & Medicine > Diagnostic Medicine (0.96)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found