Secondary Use of Clinical Problem List Entries for Neural Network-Based Disease Code Assignment

Kreuzthaler, Markus, Pfeifer, Bastian, Kramer, Diether, Schulz, Stefan

arXiv.org Artificial Intelligence 

Clinical information systems have become large repositories for semi-structured and partly annotated electronic health record data, which have reached a critical mass that makes them interesting for supervised data-driven neural network approaches. We explored automated coding of 50 character long clinical problem list entries using the International Classification of Diseases (ICD-10) and evaluated three different types of network architectures on the top 100 ICD-10 three-digit codes. A fastText baseline reached a macro-averaged F1-score of 0.83, followed by a character-level LSTM with a macro-averaged F1-score of 0.84. The top performing approach used a downstreamed RoBERTa model with a custom language model, yielding a macro-averaged F1-score of 0.88. A neural network activation analysis together with an investigation of the false positives and false negatives unveiled inconsistent manual coding as a main limiting factor.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found