Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

Open in new window