A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models

Liu, Jie, Wang, Wenxuan, Su, Yihang, Huan, Jingyuan, Chen, Wenting, Zhang, Yudi, Li, Cheng-Yi, Chang, Kao-Jung, Xin, Xiaohan, Shen, Linlin, Lyu, Michael R.

Nov-28-2024–arXiv.org Artificial Intelligence

The significant breakthroughs of Medical Multi-Modal Large Language Models (Med-MLLMs) renovate modern healthcare with robust information synthesis and medical decision support. However, these models are often evaluated on benchmarks that are unsuitable for the Med-MLLMs due to the complexity of real-world diagnostics across diverse specialties. To address this gap, we introduce Asclepius, a novel Med-MLLM benchmark that comprehensively assesses Med-MLLMs in terms of: distinct medical specialties (cardiovascular, gastroenterology, etc.) and different diagnostic capacities (perception, disease analysis, etc.). Grounded in 3 proposed core principles, Asclepius ensures a comprehensive evaluation by encompassing 15 medical specialties, stratifying into 3 main categories and 8 sub-categories of clinical tasks, and exempting overlap with existing VQA dataset. We further provide an in-depth analysis of 6 Med-MLLMs and compare them with 3 human specialists, providing insights into their competencies and limitations in various medical contexts. Our work not only advances the understanding of Med-MLLMs' capabilities but also sets a precedent for future evaluations and the safe deployment of these models in clinical environments.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Nov-28-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - Oklahoma > Payne County
    - Cushing (0.04)
  - Kentucky > Calloway County
    - Murray (0.04)
- Europe > Spain
  - Andalusia > Granada Province > Granada (0.04)
- Asia
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - China
    - Hong Kong (0.04)
    - Guangdong Province > Shenzhen (0.04)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Diagnostic Medicine > Imaging (1.00)
  - Consumer Health (1.00)
  - Therapeutic Area
    - Infections and Infectious Diseases (1.00)
    - Ophthalmology/Optometry (1.00)
    - Gastroenterology (1.00)
    - Hematology (1.00)
    - Cardiology/Vascular Diseases (1.00)
    - Immunology (1.00)
    - Oncology > Leukemia (0.93)
    - Pulmonary/Respiratory Diseases (0.93)
    - Psychiatry/Psychology (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)