Collaborating Authors


$\mathtt{MedGraph:}$ Structural and Temporal Representation Learning of Electronic Medical Records Machine Learning

Electronic medical record (EMR) data contains historical sequences of visits of patients, and each visit contains rich information, such as patient demographics, hospital utilisation and medical codes, including diagnosis, procedure and medication codes. Most existing EMR embedding methods capture visit-code associations by constructing input visit representations as binary vectors with a static vocabulary of medical codes. With this limited representation, they fail in encapsulating rich attribute information of visits (demographics and utilisation information) and/or codes (e.g., medical code descriptions). Furthermore, current work considers visits of the same patient as discrete-time events and ignores time gaps between them. However, the time gaps between visits depict dynamics of the patient's medical history inducing varying influences on future visits. To address these limitations, we present $\mathtt{MedGraph}$, a supervised EMR embedding method that captures two types of information: (1) the visit-code associations in an attributed bipartite graph, and (2) the temporal sequencing of visits through point processes. $\mathtt{MedGraph}$ produces Gaussian embeddings for visits and codes to model the uncertainty. We evaluate the performance of $\mathtt{MedGraph}$ through an extensive experimental study and show that $\mathtt{MedGraph}$ outperforms state-of-the-art EMR embedding methods in several medical risk prediction tasks.

Machine learning in healthcare -- a system's perspective Artificial Intelligence

A consequence of the fragmented and siloed healthcare landscape is that patient care (and data) is split along multitude of different facilities and computer systems and enabling interoperability between these systems is hard. The lack interoperability not only hinders continuity of care and burdens providers, but also hinders effective application of Machine Learning (ML) algorithms. Thus, most current ML algorithms, designed to understand patient care and facilitate clinical decision-support, are trained on limited datasets. This approach is analogous to the Newtonian paradigm of Reductionism in which a system is broken down into elementary components and a description of the whole is formed by understanding those components individually. A key limitation of the reductionist approach is that it ignores the component-component interactions and dynamics within the system which are often of prime significance in understanding the overall behaviour of complex adaptive systems (CAS). Healthcare is a CAS. Though the application of ML on health data have shown incremental improvements for clinical decision support, ML has a much a broader potential to restructure care delivery as a whole and maximize care value. However, this ML potential remains largely untapped: primarily due to functional limitations of Electronic Health Records (EHR) and the inability to see the healthcare system as a whole. This viewpoint (i) articulates the healthcare as a complex system which has a biological and an organizational perspective, (ii) motivates with examples, the need of a system's approach when addressing healthcare challenges via ML and, (iii) emphasizes to unleash EHR functionality - while duly respecting all ethical and legal concerns - to reap full benefits of ML.

Automatic end-to-end De-identification: Is high accuracy the only metric? Machine Learning

De-identification of electronic health records (EHR) is a vital step towards advancing health informatics research and maximising the use of available data. It is a two-step process where step one is the identification of protected health information (PHI), and step two is replacing such PHI with surrogates. Despite the recent advances in automatic de-identification of EHR, significant obstacles remain if the abundant health data available are to be used to the full potential. Accuracy in de-identification could be considered a necessary, but not sufficient condition for the use of EHR without individual patient consent. We present here a comprehensive review of the progress to date, both the impressive successes in achieving high accuracy and the significant risks and challenges that remain. To best of our knowledge, this is the first paper to present a complete picture of end-to-end automatic de-identification. We review 18 recently published automatic de-identification systems -designed to de-identify EHR in the form of free text- to show the advancements made in improving the overall accuracy of the system, and in identifying individual PHI. We argue that despite the improvements in accuracy there remain challenges in surrogate generation and replacements of identified PHIs, and the risks posed to patient protection and privacy.

Artificial intelligence could revive the art of medicine


Doctors practice medicine to deliver care, not do data entry. Yet in the era of electronic medical records (EMRs), for every hour spent with a patient, physicians spend nearly two hours on paperwork. What if technology could take care of the paperwork for us? Record-keeping systems in health care were built for back-office functions, not bedside medicine. Most EMR vendors started out building products to collect payments and schedule appointments.

EHRs Connect Research and Practice: Where Predictive Modeling, Artificial Intelligence, and Clinical Decision Support Intersect Machine Learning

Objectives: Electronic health records (EHRs) are only a first step in capturing and utilizing health-related data - the challenge is turning that data into useful information. Furthermore, EHRs are increasingly likely to include data relating to patient outcomes, functionality such as clinical decision support, and genetic information as well, and, as such, can be seen as repositories of increasingly valuable information about patients' health conditions and responses to treatment over time. Methods: We describe a case study of 423 patients treated by Centerstone within Tennessee and Indiana in which we utilized electronic health record data to generate predictive algorithms of individual patient treatment response. Multiple models were constructed using predictor variables derived from clinical, financial and geographic data. Results: For the 423 patients, 101 deteriorated, 223 improved and in 99 there was no change in clinical condition. Based on modeling of various clinical indicators at baseline, the highest accuracy in predicting individual patient response ranged from 70-72% within the models tested. In terms of individual predictors, the Centerstone Assessment of Recovery Level - Adult (CARLA) baseline score was most significant in predicting outcome over time (odds ratio 4.1 + 2.27). Other variables with consistently significant impact on outcome included payer, diagnostic category, location and provision of case management services. Conclusions: This approach represents a promising avenue toward reducing the current gap between research and practice across healthcare, developing data-driven clinical decision support based on real-world populations, and serving as a component of embedded clinical artificial intelligences that "learn" over time.