Zero-shot Generative Large Language Models for Systematic Review Screening Automation
Wang, Shuai, Scells, Harrisen, Zhuang, Shengyao, Potthast, Martin, Koopman, Bevan, Zuccon, Guido
–arXiv.org Artificial Intelligence
Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, especially in the screening phase, where abstracts of publications are assessed for inclusion in a review. This study investigates the effectiveness of using zero-shot large language models~(LLMs) for automatic screening. We evaluate the effectiveness of eight different LLMs and investigate a calibration technique that uses a predefined recall threshold to determine whether a publication should be included in a systematic review. Our comprehensive evaluation using five standard test collections shows that instruction fine-tuning plays an important role in screening, that calibration renders LLMs practical for achieving a targeted recall, and that combining both with an ensemble of zero-shot models saves significant screening time compared to state-of-the-art approaches.
arXiv.org Artificial Intelligence
Jan-31-2024
- Country:
- Europe
- Germany > Saxony
- Leipzig (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Germany > Saxony
- North America > United States
- New York > New York County > New York City (0.04)
- Oceania > Australia
- Queensland (0.04)
- Europe
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine (1.00)
- Technology: