CLUE: A Clinical Language Understanding Evaluation for LLMs

Dada, Amin, Bauer, Marie, Contreras, Amanda Butler, Koraş, Osman Alperen, Seibold, Constantin Marc, Smith, Kaleb E, Kleesiek, Jens

Jun-24-2024–arXiv.org Artificial Intelligence

Large Language Models (LLMs) are expected to significantly contribute to patient care, diagnostics, and administrative processes. Emerging biomedical LLMs aim to address healthcare-specific challenges, including privacy demands and computational constraints. Assessing the models' suitability for this sensitive application area is of the utmost importance. However, evaluation has primarily been limited to non-clinical tasks, which do not reflect the complexity of practical clinical applications. To fill this gap, we present the Clinical Language Understanding Evaluation (CLUE), a benchmark tailored to evaluate LLMs on clinical tasks. CLUE includes six tasks to test the practical applicability of LLMs in complex healthcare settings. Our evaluation includes a total of $25$ LLMs. In contrast to previous evaluations, CLUE shows a decrease in performance for nine out of twelve biomedical models. Our benchmark represents a step towards a standardized approach to evaluating and developing LLMs in healthcare to align future model development with the real-world needs of clinical application. We open-source all evaluation scripts and datasets for future research at https://github.com/TIO-IKIM/CLUE.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-24-2024

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Health & Medicine
  - Diagnostic Medicine (1.00)
  - Health Care Technology (0.68)
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area
    - Cardiology/Vascular Diseases (1.00)
    - Immunology (1.00)
    - Infections and Infectious Diseases (1.00)
    - Oncology (0.67)
    - Pulmonary/Respiratory Diseases (0.67)
- Information Technology > Security & Privacy (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found