Improving Generalizability of Extracting Social Determinants of Health Using Large Language Models through Prompt-tuning

Peng, Cheng, Yu, Zehao, Smith, Kaleb E, Lo-Ciganic, Wei-Hsuan, Bian, Jiang, Wu, Yonghui

Mar-18-2024–arXiv.org Artificial Intelligence

The progress in natural language processing (NLP) using large language models (LLMs) has greatly improved patient information extraction from clinical narratives. However, most methods based on the fine-tuning strategy have limited transfer learning ability for cross-domain applications. This study proposed a novel approach that employs a soft prompt-based learning architecture, which introduces trainable prompts to guide LLMs toward desired outputs. We examined two types of LLM architectures, including encoder-only GatorTron and decoder-only GatorTronGPT, and evaluated their performance for the extraction of social determinants of health (SDoH) using a cross-institution dataset from the 2022 n2c2 challenge and a cross-disease dataset from the University of Florida (UF) Health. The results show that decoder-only LLMs with prompt tuning achieved better performance in cross-domain applications. GatorTronGPT achieved the best F1 scores for both datasets, outperforming traditional fine-tuned GatorTron by 8.9% and 21.8% in a cross-institution setting, and 5.5% and 14.5% in a cross-disease setting.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Mar-18-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.14)
  - Florida > Alachua County
    - Gainesville (0.14)

Genre:
- Research Report > New Finding (0.89)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.94)
  - Therapeutic Area
    - Oncology (0.95)
    - Psychiatry/Psychology (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found