Collaborating Authors

Medical Record

Learning how to approve updates to machine learning algorithms in non-stationary settings Machine Learning

Machine learning algorithms in healthcare have the potential to continually learn from real-world data generated during healthcare delivery and adapt to dataset shifts. As such, the FDA is looking to design policies that can autonomously approve modifications to machine learning algorithms while maintaining or improving the safety and effectiveness of the deployed models. However, selecting a fixed approval strategy, a priori, can be difficult because its performance depends on the stationarity of the data and the quality of the proposed modifications. To this end, we investigate a learning-to-approve approach (L2A) that uses accumulating monitoring data to learn how to approve modifications. L2A defines a family of strategies that vary in their "optimism''---where more optimistic policies have faster approval rates---and searches over this family using an exponentially weighted average forecaster. To control the cumulative risk of the deployed model, we give L2A the option to abstain from making a prediction and incur some fixed abstention cost instead. We derive bounds on the average risk of the model deployed by L2A, assuming the distributional shifts are smooth. In simulation studies and empirical analyses, L2A tailors the level of optimism for each problem-setting: It learns to abstain when performance drops are common and approve beneficial modifications quickly when the distribution is stable.

Semi-Supervised Off Policy Reinforcement Learning Machine Learning

Reinforcement learning (RL) has shown great success in estimating sequential treatment strategies which account for patient heterogeneity. However, health-outcome information is often not well coded but rather embedded in clinical notes. Extracting precise outcome information is a resource intensive task. This translates into only small well-annotated cohorts available. We propose a semi-supervised learning (SSL) approach that can efficiently leverage a small sized labeled data $\mathcal{L}$ with true outcome observed, and a large sized unlabeled data $\mathcal{U}$ with outcome surrogates $\pmb W$. In particular we propose a theoretically justified SSL approach to Q-learning and develop a robust and efficient SSL approach to estimating the value function of the derived optimal STR, defined as the expected counterfactual outcome under the optimal STR. Generalizing SSL to learning STR brings interesting challenges. First, the feature distribution for predicting $Y_t$ is unknown in the $Q$-learning procedure, as it includes unknown $Y_{t-1}$ due to the sequential nature. Our methods for estimating optimal STR and its associated value function, carefully adapts to this sequentially missing data structure. Second, we modify the SSL framework to handle the use of surrogate variables $\pmb W$ which are predictive of the outcome through the joint law $\mathbb{P}_{Y,\pmb O,\pmb W}$, but are not part of the conditional distribution of interest $\mathbb{P}_{Y|\pmb O}$. We provide theoretical results to understand when and to what degree efficiency can be gained from $\pmb W$ and $\pmb O$. Our approach is robust to misspecification of the imputation models. Further, we provide a doubly robust value function estimator for the derived STR. If either the Q functions or the propensity score functions are correctly specified, our value function estimators are consistent for the true value function.

AI Driven Knowledge Extraction from Clinical Practice Guidelines: Turning Research into Practice Artificial Intelligence

Background and Objectives: Clinical Practice Guidelines (CPGs) represent the foremost methodology for sharing state-of-the-art research findings in the healthcare domain with medical practitioners to limit practice variations, reduce clinical cost, improve the quality of care, and provide evidence based treatment. However, extracting relevant knowledge from the plethora of CPGs is not feasible for already burdened healthcare professionals, leading to large gaps between clinical findings and real practices. It is therefore imperative that state-of-the-art Computing research, especially machine learning is used to provide artificial intelligence based solution for extracting the knowledge from CPGs and reducing the gap between healthcare research/guidelines and practice. Methods: This research presents a novel methodology for knowledge extraction from CPGs to reduce the gap and turn the latest research findings into clinical practice. First, our system classifies the CPG sentences into four classes such as condition-action, condition-consequences, action, and not-applicable based on the information presented in a sentence. We use deep learning with state-of-the-art word embedding, improved word vectors technique in classification process. Second, it identifies qualifier terms in the classified sentences, which assist in recognizing the condition and action phrases in a sentence. Finally, the condition and action phrase are processed and transformed into plain rule If Condition(s) Then Action format. Results: We evaluate the methodology on three different domains guidelines including Hypertension, Rhinosinusitis, and Asthma. The deep learning model classifies the CPG sentences with an accuracy of 95%. While rule extraction was validated by user-centric approach, which achieved a Jaccard coefficient of 0.6, 0.7, and 0.4 with three human experts extracted rules, respectively.

Machine learning is the new key to healthcare


As healthcare professionals are facing massive pressure not only to ensure the quality of care, but also to come up with new solutions, cures and treatments, they are becoming increasingly dependent on advanced technologies like artificial intelligence (AI) and machine learning (ML). But it is hardly a smooth partnership. The issues of skills shortages at the entry-level and of "messy data" in leveraging patient records at the high end are merely book-ends for a range of challenges that span these fields. Last week's annual Amazon Web Services Re:Invent conference, one of the largest cloud-focused events in the world, saw the launch or demonstration of a range of new cloud-based tools that are ideal for health research and treatment. ML, defined as computer algorithms that improve automatically through experience, was at the heart of these.

Amazon launches HealthLake, a platform for storing and analyzing health care data


During its re:Invent 2020 virtual keynote today, Amazon launched Amazon HealthLake, a service that enables health care organizations to store, transform, and analyze up to petabytes of life science data in Amazon Web Services. Amazon says that the HIPAA-eligible HealthLake, which is available in preview starting today, can automatically understand and extract medical information including rules, procedures, and diagnoses in real time. Health care data is often spread across various systems such as electronic medical lab systems, and it's challenging to organize because it's often unstructured. Data in medical records like clinical notes, reports and forms like insurance claims, and image scans needs to be prepped and normalized before analyses can begin. HealthLake aims to address this challenge by enabling customers to apply intelligence to hundreds of thousands of data points across different siloes in dozens of formats.

Diagnoss launches AI assistant to reduce medical coding errors


Startup Diagnoss has developed an artificial intelligence-based coding assistant to help automate the painstaking process of medical coding and billing. The Diagnoss AI medical coding engine acts as a "sidebar" to electronic health records (EHRs) and uses machine learning to improve a clinician's accuracy. The tool provides real-time feedback to medical practices during the administrative process and helps to reduce coding errors on claims. Abboud Chaballout, founder and CEO of Berkeley, California-based Diagnoss, compares the AI tool to an assistant whispering in a doctor's ear. The AI tool works similarly to the Grammarly AI grammar-checking tool.

A Survey on Principles, Models and Methods for Learning from Irregularly Sampled Time Series: From Discretization to Attention and Invariance Machine Learning

Irregularly sampled time series data arise naturally in many application domains including biology, ecology, climate science, astronomy, and health. Such data represent fundamental challenges to many classical models from machine learning and statistics due to the presence of non-uniform intervals between observations. However, there has been significant progress within the machine learning community over the last decade on developing specialized models and architectures for learning from irregularly sampled univariate and multivariate time series data. In this survey, we first describe several axes along which approaches differ including what data representations they are based on, what modeling primitives they leverage to deal with the fundamental problem of irregular sampling, and what inference tasks they are designed to perform. We then survey the recent literature organized primarily along the axis of modeling primitives. We describe approaches based on temporal discretization, interpolation, recurrence, attention, and structural invariance. We discuss similarities and differences between approaches and highlight primary strengths and weaknesses.

Using machine learning to improve patient care


Doctors are often deluged by signals from charts, test results, and other metrics to keep track of. It can be difficult to integrate and monitor all of these data for multiple patients while making real-time treatment decisions, especially when data is documented inconsistently across hospitals. In a new pair of papers, researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) explore ways for computers to help doctors make better medical decisions. One team created a machine-learning approach called "ICU Intervene" that takes large amounts of intensive-care-unit (ICU) data, from vitals and labs to notes and demographics, to determine what kinds of treatments are needed for different symptoms. The system uses "deep learning" to make real-time predictions, learning from past ICU cases to make suggestions for critical care, while also explaining the reasoning behind these decisions.

SMR: Medical Knowledge Graph Embedding for Safe Medicine Recommendation Artificial Intelligence

Most of the existing medicine recommendation systems that are mainly based on electronic medical records (EMRs) are significantly assisting doctors to make better clinical decisions benefiting both patients and caregivers. Even though the growth of EMRs is at a lighting fast speed in the era of big data, content limitations in EMRs restrain the existed recommendation systems to reflect relevant medical facts, such as drug-drug interactions. Many medical knowledge graphs that contain drug-related information, such as DrugBank, may give hope for the recommendation systems. However, the direct use of these knowledge graphs in the systems suffers from robustness caused by the incompleteness of the graphs. To address these challenges, we stand on recent advances in graph embedding learning techniques and propose a novel framework, called Safe Medicine Recommendation (SMR), in this paper. Specifically, SMR first constructs a high-quality heterogeneous graph by bridging EMRs (MIMIC-III) and medical knowledge graphs (ICD-9 ontology and DrugBank). Then, SMR jointly embeds diseases, medicines, patients, and their corresponding relations into a shared lower dimensional space. Finally, SMR uses the embeddings to decompose the medicine recommendation into a link prediction process while considering the patient's diagnoses and adverse drug reactions. To our best knowledge, SMR is the first to learn embeddings of a patient-disease-medicine graph for medicine recommendation in the world. Extensive experiments on real datasets are conducted to evaluate the effectiveness of proposed framework.

Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health Artificial Intelligence

Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of under-studied types of medical information, and demonstrate its applicability via a case study on physical mobility function. Mobility is a component of many health measures, from post-acute care and surgical outcomes to chronic frailty and disability, and is coded in the International Classification of Functioning, Disability, and Health (ICF). However, mobility and other types of functional activity remain under-studied in medical informatics, and neither the ICF nor commonly-used medical terminologies capture functional status terminology in practice. We investigated two data-driven paradigms, classification and candidate selection, to link narrative observations of mobility to standardized ICF codes, using a dataset of clinical narratives from physical therapy encounters. Recent advances in language modeling and word embedding were used as features for established machine learning models and a novel deep learning approach, achieving a macro F-1 score of 84% on linking mobility activity reports to ICF codes. Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems. This study has implications for the ongoing growth of NLP tools for a variety of specialized applications in clinical care and research.