A Study of Large Language Models for Patient Information Extraction: Model Architecture, Fine-Tuning Strategy, and Multi-task Instruction Tuning

Peng, Cheng, Dong, Xinyu, Lyu, Mengxian, Paredes, Daniel, Zhang, Yaoyun, Wu, Yonghui

Sep-8-2025–arXiv.org Artificial Intelligence

Keywords: Clinical information extraction Large language model Clinical concept extraction Clinical relation extraction Instruction tuning ABSTRACT Background N atural language processing (NLP) is a key technology t o extract important patient information from clinical narratives to support healthcare applications. The r apid development of large language models (LLMs) has revolutionized many NLP tasks in the clinical domain, yet their optimal use in patient information extraction tasks requires further exploration . This study examines LLMs ' effectiveness in patient information extraction, focusing on LLM architectures, fine - tuning strategies, and multi - task instruction tuning techniques for developing robust and generalizable patient information extraction systems . Methods This study aims to explore k ey concept s of using LLMs for clinical concept and relation extraction tasks, includ ing: ( 1) encoder - only or decoder - only LLMs, ( 2) prompt - based parameter - efficient fine - tuning (PEFT) algorithms, and ( 3) multi - task instruction tuning on few - shot learning performance . We benchmarked a suite of LLMs, including encoder - based LLMs (BERT, GatorTron) and decoder - based LLMs (GatorTronGPT, Llama 3.1, GatorTronLlama), across five datasets. We compared traditional full - size fine - tuning and prompt - based PEFT . W e explored a multi - task instruction tuning framework that combines both tasks across four datasets to evaluate the zero - shot and few - shot learning performance using the leave - one - dataset - out strategy . Results For single - task clinical CE, t he two decoder - based LLMs (Llama 3.1 and GatorTronLlama) achieved the best performance, with average F1 score s of 0.8964 and 0.8981, respectively, across the five datasets, outperforming other LLMs with average F1 improvement of 0.7~3.3%. E ncoder - based LLMs with prompt - based learning outperformed those implemented using classification .

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Sep-8-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Florida > Alachua County > Gainesville (0.28)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.94)
  - Health Care Providers & Services (0.93)
  - Health Care Technology > Medical Record (0.70)
  - Therapeutic Area
    - Oncology (0.68)
    - Psychiatry/Psychology > Addiction Disorder (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)