Is larger always better? Evaluating and prompting large language models for non-generative medical tasks

Open in new window