Overview of the ClinIQLink 2025 Shared Task on Medical Question-Answering
Colelough, Brandon, Bartels, Davis, Demner-Fushman, Dina
–arXiv.org Artificial Intelligence
In this paper, we present an overview of ClinIQLink, a shared task, collocated with the 24th BioNLP workshop at ACL 2025, designed to stress-test large language models (LLMs) on medically-oriented question answering aimed at the level of a General Practitioner. The challenge supplies 4,978 expert-verified, medical source-grounded question-answer pairs that cover seven formats: true/false, multiple choice, unordered list, short answer, short-inverse, multi-hop, and multi-hop-inverse. Participating systems, bundled in Docker or Apptainer images, are executed on the CodaBench platform or the University of Maryland's Zaratan cluster. An automated harness (Task 1) scores closed-ended items by exact match and open-ended items with a three-tier embedding metric. A subsequent physician panel (Task 2) audits the top model responses.
arXiv.org Artificial Intelligence
Jun-30-2025
- Country:
- North America > United States
- Michigan (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Maryland > Montgomery County
- Bethesda (0.04)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom > England
- North America > United States
- Genre:
- Overview (0.54)
- Research Report (0.50)
- Industry:
- Health & Medicine (1.00)
- Technology: