HealthBench: Evaluating Large Language Models Towards Improved Human Health

Open in new window