Li, Irene, Pan, Jessica, Goldwasser, Jeremy, Verma, Neha, Wong, Wai Pan, Nuzumlalı, Muhammed Yavuz, Rosand, Benjamin, Li, Yixin, Zhang, Matthew, Chang, David, Taylor, R. Andrew, Krumholz, Harlan M., Radev, Dragomir
Electronic health records (EHRs), digital collections of patient healthcare events and observations, are ubiquitous in medicine and critical to healthcare delivery, operations, and research. Despite this central role, EHRs are notoriously difficult to process automatically. Well over half of the information stored within EHRs is in the form of unstructured text (e.g. provider notes, operation reports) and remains largely untapped for secondary use. Recently, however, newer neural network and deep learning approaches to Natural Language Processing (NLP) have made considerable advances, outperforming traditional statistical and rule-based systems on a variety of tasks. In this survey paper, we summarize current neural NLP methods for EHR applications. We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics such as question answering, phenotyping, knowledge graphs, medical dialogue, multilinguality, interpretability, etc.
We present a supervised-learning algorithm from graph data (a set of graphs) for arbitrary twice-differentiable loss functions and sparse linear models over all possible subgraph features. To date, it has been shown that under all possible subgraph features, several types of sparse learning, such as Adaboost, LPBoost, LARS/LASSO, and sparse PLS regression, can be performed. Particularly emphasis is placed on simultaneous learning of relevant features from an infinite set of candidates. We first generalize techniques used in all these preceding studies to derive an unifying bounding technique for arbitrary separable functions. We then carefully use this bounding to make block coordinate gradient descent feasible over infinite subgraph features, resulting in a fast converging algorithm that can solve a wider class of sparse learning problems over graph data. We also empirically study the differences from the existing approaches in convergence property, selected subgraph features, and search-space sizes. We further discuss several unnoticed issues in sparse learning over all possible subgraph features.