Human-AI collectives produce the most accurate differential diagnoses

Zöller, N., Berger, J., Lin, I., Fu, N., Komarneni, J., Barabucci, G., Laskowski, K., Shia, V., Harack, B., Chu, E. A., Trianni, V., Kurvers, R. H. J. M., Herzog, S. M.

Jun-21-2024–arXiv.org Artificial Intelligence

Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate [1-4], lack common sense [5], and are biased [6, 7]--shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience, and can be attributed to humans' and LLMs' complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains [8] like medical diagnostics. Diagnostic errors are among the most pressing issues in medical practice [9-11], causing an estimated 795,000 deaths and permanent disabilities in the United States alone each year [12]. Reducing diagnostic errors--without incurring substantially higher costs--is essential to improve patient outcomes worldwide. This challenge has motivated a recent surge in diagnostic technologies exploiting artificial intelligence (AI) to interpret medical records, tests, and images [13, 14]. Deep learning approaches in medical imaging have shown great promise. Notable examples include mammography interpretation, cardiac function assessment, and lung cancer screening, some of which have progressed beyond the testing phase and entered clinical practice [15-17]. Recent years have also witnessed the rise of AI foundation models, especially LLMs, which show remarkable abilities to process natural language, providing accurate answers to questions in almost any domain, including medicine [18-21]. However, a recent meta-analysis [22] found that physicians often outperform LLMs, and that LLMs differ vastly in performance, also between medical specialties.

diagnosis, ensemble, llm, (17 more...)

arXiv.org Artificial Intelligence

Jun-21-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - West Virginia (0.04)
  - Ohio (0.04)
  - New York > New York County
    - New York City (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - California
    - San Francisco County > San Francisco (0.14)
    - Los Angeles County
      - Downey (0.04)
      - Claremont (0.04)
- Europe
  - Italy (0.04)
  - Sweden (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.14)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Germany
    - Berlin (0.04)
    - North Rhine-Westphalia > Cologne Region
      - Cologne (0.04)
- Asia > Middle East
  - Republic of Türkiye > Samsun Province
    - Samsun (0.04)
  - Iran > Tehran Province
    - Tehran (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine
  - Diagnostic Medicine > Imaging (1.00)
  - Therapeutic Area
    - Oncology (1.00)
    - Infections and Infectious Diseases (1.00)
- Government > Regional Government
  - North America Government > United States Government (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found