Large Language Models are Few-Shot Clinical Information Extractors

Agrawal, Monica, Hegselmann, Stefan, Lang, Hunter, Kim, Yoon, Sontag, David

Nov-30-2022–arXiv.org Artificial Intelligence

A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes. However, roadblocks have included dataset shift from the general domain and a lack of public clinical corpora and annotations. In this work, we show that large language models, such as InstructGPT, perform well at zero- and few-shot information extraction from clinical text despite not being trained specifically for the clinical domain. Whereas text classification and generation performance have already been studied extensively in such models, here we additionally demonstrate how to leverage them to tackle a diverse set of NLP tasks which require more structured outputs, including span identification, token-level sequence classification, and relation extraction. Further, due to the dearth of available data to evaluate these systems, we introduce new datasets for benchmarking few-shot clinical information extraction based on a manual re-annotation of the CASI dataset for new tasks. On the clinical extraction tasks we studied, the GPT-3 systems significantly outperform existing zero- and few-shot baselines.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Nov-30-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
- Europe
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Italy > Tuscany
    - Florence (0.04)
- Asia > Middle East
  - Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Diagnostic Medicine (1.00)
  - Consumer Health (1.00)
  - Health Care Technology > Medical Record (0.88)
  - Therapeutic Area
    - Cardiology/Vascular Diseases (1.00)
    - Pulmonary/Respiratory Diseases (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found