CLUE: A Clinical Language Understanding Evaluation for LLMs
Dada, Amin, Bauer, Marie, Contreras, Amanda Butler, Koraş, Osman Alperen, Seibold, Constantin Marc, Smith, Kaleb E, Kleesiek, Jens
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) are expected to significantly contribute to patient care, diagnostics, and administrative processes. Emerging biomedical LLMs aim to address healthcare-specific challenges, including privacy demands and computational constraints. Assessing the models' suitability for this sensitive application area is of the utmost importance. However, evaluation has primarily been limited to non-clinical tasks, which do not reflect the complexity of practical clinical applications. To fill this gap, we present the Clinical Language Understanding Evaluation (CLUE), a benchmark tailored to evaluate LLMs on clinical tasks. CLUE includes six tasks to test the practical applicability of LLMs in complex healthcare settings. Our evaluation includes a total of $25$ LLMs. In contrast to previous evaluations, CLUE shows a decrease in performance for nine out of twelve biomedical models. Our benchmark represents a step towards a standardized approach to evaluating and developing LLMs in healthcare to align future model development with the real-world needs of clinical application. We open-source all evaluation scripts and datasets for future research at https://github.com/TIO-IKIM/CLUE.
arXiv.org Artificial Intelligence
Jun-24-2024
- Country:
- North America > United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > Santa Clara County
- Santa Clara (0.04)
- Minnesota > Hennepin County
- Europe
- Asia > China
- Hong Kong (0.04)
- North America > United States
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Information Technology > Security & Privacy (0.67)
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Diagnostic Medicine (1.00)
- Health Care Technology (0.68)
- Therapeutic Area
- Infections and Infectious Diseases (1.00)
- Immunology (1.00)
- Cardiology/Vascular Diseases (1.00)
- Pulmonary/Respiratory Diseases (0.67)
- Oncology (0.67)
- Technology: