MIMIC-\RNum{4}-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp for Risk Prediction
Wang, Jing, Niu, Xing, Zhang, Tong, Shen, Jie, Kim, Juyong, Weiss, Jeremy C.
–arXiv.org Artificial Intelligence
A crucial component for clinical risk prediction is developing a reliable prediction model is collecting high-quality time series clinical events. In this work, we release such a dataset that consists of 22,588,586 Clinical Time Series events, which we term MIMIC-\RNum{4}-Ext-22MCTS. Our source data are discharge summaries selected from the well-known yet unstructured MIMIC-IV-Note \cite{Johnson2023-pg}. The general-purpose MIMIC-IV-Note pose specific challenges for our work: it turns out that the discharge summaries are too lengthy for typical natural language models to process, and the clinical events of interest often are not accompanied with explicit timestamps. Therefore, we propose a new framework that works as follows: 1) we break each discharge summary into manageably small text chunks; 2) we apply contextual BM25 and contextual semantic search to retrieve chunks that have a high potential of containing clinical events; and 3) we carefully design prompts to teach the recently released Llama-3.1-8B \cite{touvron2023llama} model to identify or infer temporal information of the chunks. The obtained dataset is informative and transparent that standard models fine-tuned on the dataset achieves significant improvements in healthcare applications. In particular, the BERT model fine-tuned based on our dataset achieves 10\% improvement in accuracy on medical question answering task, and 3\% improvement in clinical trial matching task compared with the classic BERT. The dataset is available at https://physionet.org/content/mimic-iv-ext-22mcts/1.0.0. The codebase is released at https://github.com/JingWang-RU/MIMIC-IV-Ext-22MCTS-Temporal-Clinical-Time-Series-Dataset.
arXiv.org Artificial Intelligence
Nov-19-2025
- Country:
- Asia > Middle East
- Iran > Tehran Province
- Tehran (0.04)
- Israel (0.04)
- Iran > Tehran Province
- North America > United States
- Illinois > Champaign County
- Urbana (0.14)
- Maryland > Montgomery County
- Bethesda (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- New Jersey > Hudson County
- Hoboken (0.04)
- New York > New York County
- New York City (0.04)
- Illinois > Champaign County
- Asia > Middle East
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Health & Medicine
- Diagnostic Medicine (0.93)
- Health Care Technology (1.00)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Cardiology/Vascular Diseases (1.00)
- Hematology (1.00)
- Oncology (0.95)
- Health & Medicine
- Technology: