MedSlice: Fine-Tuned Large Language Models for Secure Clinical Note Sectioning

Davis, Joshua, Sounack, Thomas, Sciacca, Kate, Brain, Jessie M, Durieux, Brigitte N, Agaronnik, Nicole D, Lindvall, Charlotta

Jan-23-2025–arXiv.org Artificial Intelligence

Extracting sections from clinical notes is crucial for downstream analysis but is challenging due to variability in formatting and labor-intensive nature of manual sectioning. While proprietary large language models (LLMs) have shown promise, privacy concerns limit their accessibility. This study develops a pipeline for automated note sectioning using open-source LLMs, focusing on three sections: History of Present Illness, Interval History, and Assessment and Plan. We fine-tuned three open-source LLMs to extract sections using a curated dataset of 487 progress notes, comparing results relative to proprietary models (GPT-4o, GPT-4o mini). Internal and external validity were assessed via precision, recall and F1 score. Fine-tuned Llama 3.1 8B outperformed GPT-4o (F1=0.92). On the external validity test set, performance remained high (F1= 0.85). Fine-tuned open-source LLMs can surpass proprietary models in clinical note sectioning, offering advantages in cost, performance, and accessibility.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jan-23-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States (0.14)
  - Canada > Quebec
    - Montreal (0.04)

Genre:
- Research Report
  - Experimental Study (0.94)
  - New Finding (0.93)

Industry:
- Health & Medicine
  - Health Care Technology > Medical Record (1.00)
  - Health Care Providers & Services (1.00)
  - Therapeutic Area > Oncology (0.70)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found