Benchmarking Large Language Models on CMExam - A comprehensive Chinese Medical Exam Dataset

Jan-19-2025, 17:44:53 GMT–Neural Information Processing Systems

Recent advancements in large language models (LLMs) have transformed the field of question answering (QA). However, evaluating LLMs in the medical field is challenging due to the lack of standardized and comprehensive datasets. To address this gap, we introduce CMExam, sourced from the Chinese National Medical Licensing Examination. CMExam consists of 60K multiple-choice questions for standardized and objective evaluations, as well as solution explanations for model reasoning evaluation in an open-ended manner. For in-depth analyses of LLMs, we invited medical professionals to label five additional question-wise annotations, including disease groups, clinical departments, medical disciplines, areas of competency, and question difficulty levels.

cmexam, comprehensive chinese medical exam dataset, language model, (4 more...)

Neural Information Processing Systems

Jan-19-2025, 17:44:53 GMT

Conferences Web Page

Add feedback

Industry:
- Health & Medicine (0.98)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)