gatortrongpt
A Study of Large Language Models for Patient Information Extraction: Model Architecture, Fine-Tuning Strategy, and Multi-task Instruction Tuning
Peng, Cheng, Dong, Xinyu, Lyu, Mengxian, Paredes, Daniel, Zhang, Yaoyun, Wu, Yonghui
Keywords: Clinical information extraction Large language model Clinical concept extraction Clinical relation extraction Instruction tuning ABSTRACT Background N atural language processing (NLP) is a key technology t o extract important patient information from clinical narratives to support healthcare applications. The r apid development of large language models (LLMs) has revolutionized many NLP tasks in the clinical domain, yet their optimal use in patient information extraction tasks requires further exploration . This study examines LLMs ' effectiveness in patient information extraction, focusing on LLM architectures, fine - tuning strategies, and multi - task instruction tuning techniques for developing robust and generalizable patient information extraction systems . Methods This study aims to explore k ey concept s of using LLMs for clinical concept and relation extraction tasks, includ ing: ( 1) encoder - only or decoder - only LLMs, ( 2) prompt - based parameter - efficient fine - tuning (PEFT) algorithms, and ( 3) multi - task instruction tuning on few - shot learning performance . We benchmarked a suite of LLMs, including encoder - based LLMs (BERT, GatorTron) and decoder - based LLMs (GatorTronGPT, Llama 3.1, GatorTronLlama), across five datasets. We compared traditional full - size fine - tuning and prompt - based PEFT . W e explored a multi - task instruction tuning framework that combines both tasks across four datasets to evaluate the zero - shot and few - shot learning performance using the leave - one - dataset - out strategy . Results For single - task clinical CE, t he two decoder - based LLMs (Llama 3.1 and GatorTronLlama) achieved the best performance, with average F1 score s of 0.8964 and 0.8981, respectively, across the five datasets, outperforming other LLMs with average F1 improvement of 0.7~3.3%. E ncoder - based LLMs with prompt - based learning outperformed those implemented using classification .
- North America > United States > Florida > Alachua County > Gainesville (0.28)
- Oceania > Australia (0.04)
UF-HOBI at "Discharge Me!": A Hybrid Solution for Discharge Summary Generation Through Prompt-based Tuning of GatorTronGPT Models
Lyu, Mengxian, Peng, Cheng, Paredes, Daniel, Chen, Ziyi, Chen, Aokun, Bian, Jiang, Wu, Yonghui
Automatic generation of discharge summaries presents significant challenges due to the length of clinical documentation, the dispersed nature of patient information, and the diverse terminology used in healthcare. This paper presents a hybrid solution for generating discharge summary sections as part of our participation in the "Discharge Me!" Challenge at the BioNLP 2024 Shared Task. We developed a two-stage generation method using both extractive and abstractive techniques, in which we first apply name entity recognition (NER) to extract key clinical concepts, which are then used as input for a prompt-tuning-based GatorTronGPT model to generate coherent text for two important sections including "Brief Hospital Course" and "Discharge Instructions". Our system was ranked 5th in this challenge, achieving an overall score of 0.284. The results demonstrate the effectiveness of our hybrid solution in improving the quality of automated discharge section generation.
Automatic Summarization of Doctor-Patient Encounter Dialogues Using Large Language Model through Prompt Tuning
Lyu, Mengxian, Peng, Cheng, Li, Xiaohan, Balian, Patrick, Bian, Jiang, Wu, Yonghui
Automatic text summarization (ATS) is an emerging technology to assist clinicians in providing continuous and coordinated care. This study presents an approach to summarize doctor-patient dialogues using generative large language models (LLMs). We developed prompt-tuning algorithms to instruct generative LLMs to summarize clinical text. We examined the prompt-tuning strategies, the size of soft prompts, and the few-short learning ability of GatorTronGPT, a generative clinical LLM developed using 277 billion clinical and general English words with up to 20 billion parameters. We compared GatorTronGPT with a previous solution based on fine-tuning of a widely used T5 model, using a clinical benchmark dataset MTS-DIALOG. The experimental results show that the GatorTronGPT- 20B model achieved the best performance on all evaluation metrics. The proposed solution has a low computing cost as the LLM parameters are not updated during prompt-tuning. This study demonstrates the efficiency of generative clinical LLMs for clinical ATS through prompt tuning.
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
- Health & Medicine > Consumer Health (0.94)
Generative Large Language Models Are All-purpose Text Analytics Engines: Text-to-text Learning Is All Your Need
Peng, Cheng, Yang, Xi, Chen, Aokun, Yu, Zehao, Smith, Kaleb E, Costa, Anthony B, Flores, Mona G, Bian, Jiang, Wu, Yonghui
Objective To solve major clinical natural language processing (NLP) tasks using a unified text-to-text learning architecture based on a generative large language model (LLM) via prompt tuning. Methods We formulated 7 key clinical NLP tasks as text-to-text learning and solved them using one unified generative clinical LLM, GatorTronGPT, developed using GPT-3 architecture and trained with up to 20 billion parameters. We adopted soft prompts (i.e., trainable vectors) with frozen LLM, where the LLM parameters were not updated (i.e., frozen) and only the vectors of soft prompts were updated, known as prompt tuning. We added additional soft prompts as a prefix to the input layer, which were optimized during the prompt tuning. We evaluated the proposed method using 7 clinical NLP tasks and compared them with previous task-specific solutions based on Transformer models. Results and Conclusion The proposed approach achieved state-of-the-art performance for 5 out of 7 major clinical NLP tasks using one unified generative LLM. Our approach outperformed previous task-specific transformer models by ~3% for concept extraction and 7% for relation extraction applied to social determinants of health, 3.4% for clinical concept normalization, 3.4~10% for clinical abbreviation disambiguation, and 5.5~9% for natural language inference. Our approach also outperformed a previously developed prompt-based machine reading comprehension (MRC) model, GatorTron-MRC, for clinical concept and relation extraction. The proposed approach can deliver the ``one model for all`` promise from training to deployment using a unified generative LLM.
- North America > United States > Florida > Alachua County > Gainesville (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Wisconsin (0.04)
- (6 more...)
A Study of Generative Large Language Model for Medical Research and Healthcare
Peng, Cheng, Yang, Xi, Chen, Aokun, Smith, Kaleb E, PourNejatian, Nima, Costa, Anthony B, Martin, Cheryl, Flores, Mona G, Zhang, Ying, Magoc, Tanja, Lipori, Gloria, Mitchell, Duane A, Ospina, Naykky S, Ahmed, Mustafa M, Hogan, William R, Shenkman, Elizabeth A, Guo, Yi, Bian, Jiang, Wu, Yonghui
There is enormous enthusiasm and concerns in using large language models (LLMs) in healthcare, yet current assumptions are all based on general-purpose LLMs such as ChatGPT. This study develops a clinical generative LLM, GatorTronGPT, using 277 billion words of mixed clinical and English text with a GPT-3 architecture of 20 billion parameters. GatorTronGPT improves biomedical natural language processing for medical research. Synthetic NLP models trained using GatorTronGPT generated text outperform NLP models trained using real-world clinical text. Physicians Turing test using 1 (worst) to 9 (best) scale shows that there is no significant difference in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights on the opportunities and challenges of LLMs for medical research and healthcare.
- North America > United States > Florida > Alachua County > Gainesville (0.29)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- North America > Dominican Republic (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study > Negative Result (0.34)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Health Care Technology > Medical Record (0.69)