How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

Cahyawijaya, Samuel, Wilie, Bryan, Lovenia, Holy, Zhong, Huan, Zhong, MingQian, Ip, Yuk-Yu Nancy, Fung, Pascale

Oct-25-2022–arXiv.org Artificial Intelligence

Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA. However, the applicability of these methods to real clinical use cases is hindered, due to the limitation of pre-trained LMs in processing long textual data with thousands of words, which is a common length for a clinical note. In this work, we explore long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record (EHR) system to show the effectiveness and generalizability of this concept, achieving 10\% F1-score improvement. Based on our experiments, we conclude that capturing a longer clinical note interval is beneficial to the model performance, but there are different cut-off intervals to achieve the optimal performance for different target variables. Our code is available at https://github.com/HLTCHKUST/long-biomedical-model.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Oct-25-2022

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - United States > Minnesota
    - Hennepin County > Minneapolis (0.14)
  - Canada > Alberta
    - Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > China
  - Hong Kong > Kowloon (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine
  - Health Care Technology > Medical Record (1.00)
  - Therapeutic Area > Endocrinology
    - Diabetes (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Chatbot (0.64)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found