MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

Kwon, Sunjae, Yao, Zonghai, Jordan, Harmon S., Levy, David A., Corner, Brian, Yu, Hong

Oct-11-2022–arXiv.org Artificial Intelligence

This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences ($MedJ$). Then, we introduce a novel medical jargon extraction ($MedJEx$) model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Oct-11-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)
- Europe
  - France (0.04)
  - Switzerland > Basel-City
    - Basel (0.04)
  - Finland > Uusimaa
    - Helsinki (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine
  - Health Care Technology > Medical Record (0.69)
  - Health Care Providers & Services (0.68)
  - Consumer Health (0.68)
  - Therapeutic Area
    - Neurology (0.46)
    - Hematology (0.46)
    - Cardiology/Vascular Diseases (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found