Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models