Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings

Newman-Griffis, Denis, Fosler-Lussier, Eric

Oct-1-2019–arXiv.org Artificial Intelligence

We present a method for characterizing the usage patterns of clinical concepts among different document types, in order to capture semantic differences beyond the lexical level. By training concept embeddings on clinical documents of different types and measuring the differences in their nearest neighborhood structures, we are able to measure divergences in concept usage while correcting for noise in embedding learning. Experiments on the MIMIC-III corpus demonstrate that our approach captures clinically-relevant differences in concept usage and provides an intuitive way to explore semantic characteristics of clinical document collections. 1 Introduction Sublanguage analysis has played a pivotal role in natural language processing of health data, from highlighting the clear linguistic differences between biomedical literature and clinical text (Friedman et al., 2002) to supporting adaptation to multiple languages (Laippala et al., 2009). Recent studies of clinical sublanguage have extended sublanguage study to the document type level, in order to improve our understanding of the syntactic and lexical differences between highly distinct document types used in modern EHR systems (Feldman et al., 2016; Gr on et al., 2019). However, one key axis of sublanguage characterization that has not yet been explored is how domain-specific clinical concepts differ in their usage patterns among different document types.

consistency, document type, neighbor, (16 more...)

arXiv.org Artificial Intelligence

Oct-1-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Mississippi (0.04)
  - Washington > King County
    - Seattle (0.04)
  - Ohio > Franklin County
    - Columbus (0.04)
  - Maryland > Montgomery County
    - Bethesda (0.04)
  - California > San Francisco County
    - San Francisco (0.14)
- Europe
  - Germany > Berlin (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy > Tuscany
    - Florence (0.05)
- Asia
  - Middle East > Israel (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine
  - Diagnostic Medicine (0.97)
  - Pharmaceuticals & Biotechnology (0.94)
  - Health Care Providers & Services (0.68)
  - Therapeutic Area
    - Cardiology/Vascular Diseases (1.00)
    - Infections and Infectious Diseases (0.68)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found