Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification

Nguyen, Thanh-Tung, Schlegel, Viktor, Kashyap, Abhinav, Winkler, Stefan, Huang, Shao-Syuan, Liu, Jie-Jyun, Lin, Chih-Jen

Apr-27-2023–arXiv.org Artificial Intelligence

Clinical notes are assigned ICD codes - sets of codes for diagnoses and procedures. In the recent years, predictive machine learning models have been built for automatic ICD coding. However, there is a lack of widely accepted benchmarks for automated ICD coding models based on large-scale public EHR data. This paper proposes a public benchmark suite for ICD-10 coding using a large EHR dataset derived from MIMIC-IV, the most recent public EHR dataset. We implement and compare several popular methods for ICD coding prediction tasks to standardize data preprocessing and establish a comprehensive ICD coding benchmark dataset. This approach fosters reproducibility and model comparison, accelerating progress toward employing automated ICD coding in future studies. Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providing more data points and a higher number of ICD codes than MIMIC-III. Our open-source code offers easy access to data processing steps, benchmark creation, and experiment replication for those with MIMIC-IV access, providing insights, guidance, and protocols to efficiently develop ICD coding models.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Apr-27-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.68)
- North America > United States (1.00)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine
  - Health Care Providers & Services (1.00)
  - Health Care Technology > Medical Record (0.67)
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area
    - Cardiology/Vascular Diseases (0.68)
    - Endocrinology (0.68)
    - Gastroenterology (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found