Structured Outputs Enable General-Purpose LLMs to be Medical Experts

Guo, Guangfu, Zhang, Kai, Hoo, Bryan, Cai, Yujun, Lu, Xiaoqian, Peng, Nanyun, Wang, Yiwei

Mar-5-2025–arXiv.org Artificial Intelligence

Medical question-answering (QA) is a critical task for evaluating how effectively large language models (LLMs) encode clinical knowledge and assessing their potential applications in medicine. Despite showing promise on multiple-choice tests, LLMs frequently struggle with open-ended medical questions, producing responses with dangerous hallucinations or lacking comprehensive coverage of critical aspects. Existing approaches attempt to address these challenges through domain-specific fine-tuning, but this proves resource-intensive and difficult to scale across models. To improve the comprehensiveness and factuality of medical responses, we propose a novel approach utilizing structured medical reasoning. Our method guides LLMs through an seven-step cognitive process inspired by clinical diagnosis, enabling more accurate and complete answers without additional training. Experiments on the MedLFQA benchmark demonstrate that our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models. Notably, this improvement transfers to smaller models, highlighting the method's efficiency and scalability. Our code and datasets are available.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-5-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine
  - Consumer Health (0.93)
  - Diagnostic Medicine (0.68)
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area
    - Cardiology/Vascular Diseases (0.68)
    - Genetic Disease (1.00)
    - Nephrology (1.00)
    - Neurology (0.93)
    - Psychiatry/Psychology (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found