Evaluating the performance and fragility of large language models on the self-assessment for neurological surgeons

Open in new window