Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models

Open in new window