Reliable and diverse evaluation of LLM medical knowledge mastery