Overview of the ClinIQLink 2025 Shared Task on Medical Question-Answering

Colelough, Brandon, Bartels, Davis, Demner-Fushman, Dina

Jun-30-2025–arXiv.org Artificial Intelligence

In this paper, we present an overview of ClinIQLink, a shared task, collocated with the 24th BioNLP workshop at ACL 2025, designed to stress-test large language models (LLMs) on medically-oriented question answering aimed at the level of a General Practitioner. The challenge supplies 4,978 expert-verified, medical source-grounded question-answer pairs that cover seven formats: true/false, multiple choice, unordered list, short answer, short-inverse, multi-hop, and multi-hop-inverse. Participating systems, bundled in Docker or Apptainer images, are executed on the CodaBench platform or the University of Maryland's Zaratan cluster. An automated harness (Task 1) scores closed-ended items by exact match and open-ended items with a three-tier embedding metric. A subsequent physician panel (Task 2) audits the top model responses.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Jun-30-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Michigan (0.04)
  - Pennsylvania > Philadelphia County
    - Philadelphia (0.04)
  - Maryland > Montgomery County
    - Bethesda (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)

Genre:
- Overview (0.54)
- Research Report (0.50)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.31)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found