Towards Assessing Medical Ethics from Knowledge to Practice

Hong, Chang, Wu, Minghao, Xiao, Qingying, Wang, Yuchi, Wan, Xiang, Yu, Guangjun, Wang, Benyou, Hu, Yan

Aug-8-2025–arXiv.org Artificial Intelligence

The integration of large language models into healthcare necessitates a rigorous evaluation of their ethical reasoning, an area current benchmarks often overlook. We introduce PrinciplismQA, a comprehensive benchmark with 3,648 questions designed to systematically assess LLMs' alignment with core medical ethics. Grounded in Principlism, our benchmark features a high-quality dataset. This includes multiple-choice questions curated from authoritative textbooks and open-ended questions sourced from authoritative medical ethics case study literature, all validated by medical experts. Our experiments reveal a significant gap between models' ethical knowledge and their practical application, especially in dynamically applying ethical principles to real-world scenarios. Most LLMs struggle with dilemmas concerning Beneficence, often over-emphasizing other principles. Frontier closed-source models, driven by strong general capabilities, currently lead the benchmark. Notably, medical domain fine-tuning can enhance models' overall ethical competence, but further progress requires better alignment with medical ethical knowledge. PrinciplismQA offers a scalable framework to diagnose these specific ethical weaknesses, paving the way for more balanced and responsible medical AI.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Aug-8-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report
  - Experimental Study (0.68)
  - New Finding (0.67)

Industry:
- Health & Medicine > Government Relations & Public Policy (0.67)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.99)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found