### HYPE: A High Performing NLP System for Automatically Detecting Hypoglycemia Events from Electronic Health Record Notes

Hypoglycemia is common and potentially dangerous among those treated for diabetes. Electronic health records (EHRs) are important resources for hypoglycemia surveillance. In this study, we report the development and evaluation of deep learning-based natural language processing systems to automatically detect hypoglycemia events from the EHR narratives. Experts in Public Health annotated 500 EHR notes from patients with diabetes. We used this annotated dataset to train and evaluate HYPE, supervised NLP systems for hypoglycemia detection. In our experiment, the convolutional neural network model yielded promising performance $Precision=0.96 \pm 0.03, Recall=0.86 \pm 0.03, F1=0.91 \pm 0.03$ in a 10-fold cross-validation setting. Despite the annotated data is highly imbalanced, our CNN-based HYPE system still achieved a high performance for hypoglycemia detection. HYPE could be used for EHR-based hypoglycemia surveillance and to facilitate clinicians for timely treatment of high-risk patients.

### Predicting readmission risk from doctors' notes

We develop a model using deep learning techniques and natural language processing on unstructured text from medical records to predict hospital-wide $30$-day unplanned readmission, with c-statistic $.70$. Our model is constructed to allow physicians to interpret the significant features for prediction.

### Hybrid Gradient Boosting Trees and Neural Networks for Forecasting Operating Room Data

Time series data constitutes a distinct and growing problem in machine learning. As the corpus of time series data grows larger, deep models that simultaneously learn features and classify with these features can be intractable or suboptimal. In this paper, we present feature learning via long short term memory (LSTM) networks and prediction via gradient boosting trees (XGB). Focusing on the consequential setting of electronic health record data, we predict the occurrence of hypoxemia five minutes into the future based on past features. We make two observations: 1) long short term memory networks are effective at capturing long term dependencies based on a single feature and 2) gradient boosting trees are capable of tractably combining a large number of features including static features like height and weight. With these observations in mind, we generate features by performing "supervised" representation learning with LSTM networks. Augmenting the original XGB model with these features gives significantly better performance than either individual method.

### Identification of Predictive Sub-Phenotypes of Acute Kidney Injury using Structured and Unstructured Electronic Health Record Data with Memory Networks

Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an improved understanding of the disease pathophysiology and development of more targeted clinical interventions. This study used a memory network-based deep learning approach to discover predictive AKI sub-phenotypes using structured and unstructured electronic health record (EHR) data of patients before AKI diagnosis. We leveraged a real world critical care EHR corpus including 37,486 ICU stays. Our approach identified three distinct sub-phenotypes: sub-phenotype I is with an average age of 63.03$\pm 17.25$ years, and is characterized by mild loss of kidney excretory function (Serum Creatinne (SCr) $1.55\pm 0.34$ mg/dL, estimated Glomerular Filtration Rate Test (eGFR) $107.65\pm 54.98$ mL/min/1.73$m^2$). These patients are more likely to develop stage I AKI. Sub-phenotype II is with average age 66.81$\pm 10.43$ years, and was characterized by severe loss of kidney excretory function (SCr $1.96\pm 0.49$ mg/dL, eGFR $82.19\pm 55.92$ mL/min/1.73$m^2$). These patients are more likely to develop stage III AKI. Sub-phenotype III is with average age 65.07$\pm 11.32$ years, and was characterized moderate loss of kidney excretory function and thus more likely to develop stage II AKI (SCr $1.69\pm 0.32$ mg/dL, eGFR $93.97\pm 56.53$ mL/min/1.73$m^2$). Both SCr and eGFR are significantly different across the three sub-phenotypes with statistical testing plus postdoc analysis, and the conclusion still holds after age adjustment.