Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset