Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings

Newman-Griffis, Denis, Fosler-Lussier, Eric

arXiv.org Artificial Intelligence 

We present a method for characterizing the usage patterns of clinical concepts among different document types, in order to capture semantic differences beyond the lexical level. By training concept embeddings on clinical documents of different types and measuring the differences in their nearest neighborhood structures, we are able to measure divergences in concept usage while correcting for noise in embedding learning. Experiments on the MIMIC-III corpus demonstrate that our approach captures clinically-relevant differences in concept usage and provides an intuitive way to explore semantic characteristics of clinical document collections. 1 Introduction Sublanguage analysis has played a pivotal role in natural language processing of health data, from highlighting the clear linguistic differences between biomedical literature and clinical text (Friedman et al., 2002) to supporting adaptation to multiple languages (Laippala et al., 2009). Recent studies of clinical sublanguage have extended sublanguage study to the document type level, in order to improve our understanding of the syntactic and lexical differences between highly distinct document types used in modern EHR systems (Feldman et al., 2016; Gr on et al., 2019). However, one key axis of sublanguage characterization that has not yet been explored is how domain-specific clinical concepts differ in their usage patterns among different document types.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found