Answering real-world clinical questions using large language model based systems

Low, Yen Sia, Jackson, Michael L., Hyde, Rebecca J., Brown, Robert E., Sanghavi, Neil M., Baldwin, Julian D., Pike, C. William, Muralidharan, Jananee, Hui, Gavin, Alexander, Natasha, Hassan, Hadeel, Nene, Rahul V., Pike, Morgan, Pokrzywa, Courtney J., Vedak, Shivam, Yan, Adam Paul, Yao, Dong-han, Zipursky, Amy R., Dinh, Christina, Ballentine, Philip, Derieg, Dan C., Polony, Vladimir, Chawdry, Rehan N., Davies, Jordan, Hyde, Brigham B., Shah, Nigam H., Gombar, Saurabh

Jun-29-2024–arXiv.org Artificial Intelligence

Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care.

chatrwd, clinical question, openevidence, (17 more...)

arXiv.org Artificial Intelligence

Jun-29-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - New York > New York County
      - New York City (0.04)
    - Michigan > Washtenaw County
      - Ann Arbor (0.14)
    - California
      - Los Angeles County > Los Angeles (0.28)
      - Santa Clara County > Stanford (0.04)
      - San Diego County > San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.88)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area
    - Oncology (1.00)
    - Immunology (1.00)
    - Cardiology/Vascular Diseases (1.00)
    - Neurology (0.94)
    - Gastroenterology (0.93)
    - Rheumatology (0.68)
    - Musculoskeletal (0.68)
    - Endocrinology > Diabetes (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found