Identification of Predictive Sub-Phenotypes of Acute Kidney Injury using Structured and Unstructured Electronic Health Record Data with Memory Networks Machine Learning

Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an improved understanding of the disease pathophysiology and development of more targeted clinical interventions. This study used a memory network-based deep learning approach to discover predictive AKI sub-phenotypes using structured and unstructured electronic health record (EHR) data of patients before AKI diagnosis. We leveraged a real world critical care EHR corpus including 37,486 ICU stays. Our approach identified three distinct sub-phenotypes: sub-phenotype I is with an average age of 63.03$ \pm 17.25 $ years, and is characterized by mild loss of kidney excretory function (Serum Creatinne (SCr) $1.55\pm 0.34$ mg/dL, estimated Glomerular Filtration Rate Test (eGFR) $107.65\pm 54.98$ mL/min/1.73$m^2$). These patients are more likely to develop stage I AKI. Sub-phenotype II is with average age 66.81$ \pm 10.43 $ years, and was characterized by severe loss of kidney excretory function (SCr $1.96\pm 0.49$ mg/dL, eGFR $82.19\pm 55.92$ mL/min/1.73$m^2$). These patients are more likely to develop stage III AKI. Sub-phenotype III is with average age 65.07$ \pm 11.32 $ years, and was characterized moderate loss of kidney excretory function and thus more likely to develop stage II AKI (SCr $1.69\pm 0.32$ mg/dL, eGFR $93.97\pm 56.53$ mL/min/1.73$m^2$). Both SCr and eGFR are significantly different across the three sub-phenotypes with statistical testing plus postdoc analysis, and the conclusion still holds after age adjustment.

The Use of NLP to Extract Unstructured Medical Data From Text - insideBIGDATA


When working in healthcare, a lot of the relevant information for making accurate predictions and recommendations is only available in free-text clinical notes. Much of this data is trapped in free-text documents in unstructured form. This data is needed in order to make healthcare decisions. Hence, it is important to be able to extract data in the best possible way such that the information obtained can be analyzed and used. State-of-the-art NLP algorithms can extract clinical data from text using deep learning techniques such as healthcare-specific word embeddings, named entity recognition models, and entity resolution models.

Unsupervised Learning for Computational Phenotyping Machine Learning

With large volumes of health care data comes the research area of computational phenotyping, making use of techniques such as machine learning to describe illnesses and other clinical concepts from the data itself. The "traditional" approach of using supervised learning relies on a domain expert, and has two main limitations: requiring skilled humans to supply correct labels limits its scalability and accuracy, and relying on existing clinical descriptions limits the sorts of patterns that can be found. For instance, it may fail to acknowledge that a disease treated as a single condition may really have several subtypes with different phenotypes, as seems to be the case with asthma and heart disease. Some recent papers cite successes instead using unsupervised learning. This shows great potential for finding patterns in Electronic Health Records that would otherwise be hidden and that can lead to greater understanding of conditions and treatments. This work implements a method derived strongly from Lasko et al., but implements it in Apache Spark and Python and generalizes it to laboratory time-series data in MIMIC-III. It is released as an open-source tool for exploration, analysis, and visualization, available at

Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches Artificial Intelligence

This work investigates multiple approaches to Named Entity Recognition (NER) for text in Electronic Health Record (EHR) data. In particular, we look into the application of (i) rule-based, (ii) deep learning and (iii) transfer learning systems for the task of NER on brain imaging reports with a focus on records from patients with stroke. We explore the strengths and weaknesses of each approach, develop rules and train on a common dataset, and evaluate each system's performance on common test sets of Scottish radiology reports from two sources (brain imaging reports in ESS -- Edinburgh Stroke Study data collected by NHS Lothian as well as radiology reports created in NHS Tayside). Our comparison shows that a hand-crafted system is the most accurate way to automatically label EHR, but machine learning approaches can provide a feasible alternative where resources for a manual system are not readily available.

Google DeepMind's 1.6m UK medical record slurp 'legally inappropriate'


Google's use of Brits' medical records to train an AI and treat people was legally "inappropriate," says Dame Fiona Caldicott, the National Data Guardian at the UK's Department of Health. In April 2016 it was revealed the web giant had signed a deal with the Royal Free Hospital in London to build an artificially intelligent application called Streams, which would analyze patients' records and identify those who had acute kidney damage. As part of the agreement, the hospital handed over 1.6 million sets of NHS medical files to DeepMind, Google's highly secretive machine-learning nerve center. However, not every patient was aware that their data was being given to Google to train the Streams AI model. And the software was supposed to be used only as a trial – an experiment with software-driven diagnosis – yet it was ultimately used to detect kidney injuries in people and alert clinicians that they needed treatment.