Zero-shot Generative Large Language Models for Systematic Review Screening Automation

Wang, Shuai, Scells, Harrisen, Zhuang, Shengyao, Potthast, Martin, Koopman, Bevan, Zuccon, Guido

Jan-31-2024–arXiv.org Artificial Intelligence

Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, especially in the screening phase, where abstracts of publications are assessed for inclusion in a review. This study investigates the effectiveness of using zero-shot large language models~(LLMs) for automatic screening. We evaluate the effectiveness of eight different LLMs and investigate a calibration technique that uses a predefined recall threshold to determine whether a publication should be included in a systematic review. Our comprehensive evaluation using five standard test collections shows that instruction fine-tuning plays an important role in screening, that calibration renders LLMs practical for achieving a targeted recall, and that combining both with an ensemble of zero-shot models saves significant screening time compared to state-of-the-art approaches.

cal, screening, systematic review, (13 more...)

arXiv.org Artificial Intelligence

Jan-31-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Queensland (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Germany > Saxony
    - Leipzig (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.54)