Towards Assessing Medical Ethics from Knowledge to Practice
Hong, Chang, Wu, Minghao, Xiao, Qingying, Wang, Yuchi, Wan, Xiang, Yu, Guangjun, Wang, Benyou, Hu, Yan
–arXiv.org Artificial Intelligence
The integration of large language models into healthcare necessitates a rigorous evaluation of their ethical reasoning, an area current benchmarks often overlook. We introduce PrinciplismQA, a comprehensive benchmark with 3,648 questions designed to systematically assess LLMs' alignment with core medical ethics. Grounded in Principlism, our benchmark features a high-quality dataset. This includes multiple-choice questions curated from authoritative textbooks and open-ended questions sourced from authoritative medical ethics case study literature, all validated by medical experts. Our experiments reveal a significant gap between models' ethical knowledge and their practical application, especially in dynamically applying ethical principles to real-world scenarios. Most LLMs struggle with dilemmas concerning Beneficence, often over-emphasizing other principles. Frontier closed-source models, driven by strong general capabilities, currently lead the benchmark. Notably, medical domain fine-tuning can enhance models' overall ethical competence, but further progress requires better alignment with medical ethical knowledge. PrinciplismQA offers a scalable framework to diagnose these specific ethical weaknesses, paving the way for more balanced and responsible medical AI.
arXiv.org Artificial Intelligence
Aug-8-2025
- Country:
- Asia
- China
- Guangdong Province > Shenzhen (0.04)
- Hong Kong (0.04)
- Middle East > Oman (0.04)
- China
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- England
- North America > United States
- New Mexico > Bernalillo County > Albuquerque (0.04)
- Asia
- Genre:
- Research Report
- Experimental Study (0.68)
- New Finding (0.67)
- Research Report
- Technology: