MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs

Wei, Jianhui, Meng, Zijie, Xiao, Zikai, Hu, Tianxiang, Feng, Yang, Zhou, Zhijie, Wu, Jian, Liu, Zuozhu

Jul-1-2025–arXiv.org Artificial Intelligence

While Medical Large Language Models (MedLLMs) have demonstrated remarkable potential in clinical tasks, their ethical safety remains insufficiently explored. This paper introduces $\textbf{MedEthicsQA}$, a comprehensive benchmark comprising $\textbf{5,623}$ multiple-choice questions and $\textbf{5,351}$ open-ended questions for evaluation of medical ethics in LLMs. We systematically establish a hierarchical taxonomy integrating global medical ethical standards. The benchmark encompasses widely used medical datasets, authoritative question banks, and scenarios derived from PubMed literature. Rigorous quality control involving multi-stage filtering and multi-faceted expert validation ensures the reliability of the dataset with a low error rate ($2.72\%$). Evaluation of state-of-the-art MedLLMs exhibit declined performance in answering medical ethics questions compared to their foundation counterparts, elucidating the deficiencies of medical ethics alignment. The dataset, registered under CC BY-NC 4.0 license, is available at https://github.com/JianhuiWei7/MedEthicsQA.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-1-2025

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.04)
- South America > Brazil (0.04)
- Africa > South Africa (0.04)
- North America
  - Canada (0.04)
  - United States > Florida
    - Miami-Dade County > Miami (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- Asia
  - Japan (0.04)
  - India (0.04)
  - China (0.04)
  - Singapore > Central Region
    - Singapore (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.74)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found