Training without training data: Improving the generalizability of automated medical abbreviation disambiguation

Skreta, Marta, Arbabi, Aryan, Wang, Jixuan, Brudno, Michael

Dec-12-2019–arXiv.org Machine Learning

Proceedings of Machine Learning Research XX:1-12, 2019 Machine Learning for Health (ML4H) at NeurIPS 2019 1 Training without training data: Improving the generalizability of automated medical abbreviation disambiguation* Marta Skreta 1,2 martaskreta@cs.toronto.edu Michael Brudno 1,2 brudno@cs.toronto.edu 1 University of Toronto, Department of Computer Science 2 The Hospital for Sick Children, Center for Computational Medicine 3 Vector Institute for Artifical Intelligence, Toronto, Canada Abstract Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model's ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model's representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy of abbreviation disambiguation by almost 14% on the CASI dataset and 4% on i2b2. 1. Introduction Health care practitioners typically use abbreviations when preparing clinical records, saving time and space with the cost of increased ambiguity.

abbreviation, disambiguation, expansion, (17 more...)

arXiv.org Machine Learning

Dec-12-2019

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Minnesota (0.04)
  - Canada > Ontario
    - Toronto (1.00)

Genre:
- Research Report (0.83)

Industry:
- Health & Medicine
  - Therapeutic Area (0.95)
  - Health Care Technology > Medical Record (0.55)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Ontologies (0.32)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found