Collaborating Authors


The Healthy States of America: Creating a Health Taxonomy with Social Media Artificial Intelligence

Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomy's sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomy's structure, we found that our disease-specific health scores are causally linked with the officially reported prevalence of 18 conditions.

MIMIC-IF: Interpretability and Fairness Evaluation of Deep Learning Models on MIMIC-IV Dataset Artificial Intelligence

The recent release of large-scale healthcare datasets has greatly propelled the research of data-driven deep learning models for healthcare applications. However, due to the nature of such deep black-boxed models, concerns about interpretability, fairness, and biases in healthcare scenarios where human lives are at stake call for a careful and thorough examinations of both datasets and models. In this work, we focus on MIMIC-IV (Medical Information Mart for Intensive Care, version IV), the largest publicly available healthcare dataset, and conduct comprehensive analyses of dataset representation bias as well as interpretability and prediction fairness of deep learning models for in-hospital mortality prediction. In terms of interpretabilty, we observe that (1) the best performing interpretability method successfully identifies critical features for mortality prediction on various prediction models; (2) demographic features are important for prediction. In terms of fairness, we observe that (1) there exists disparate treatment in prescribing mechanical ventilation among patient groups across ethnicity, gender and age; (2) all of the studied mortality predictors are generally fair while the IMV-LSTM (Interpretable Multi-Variable Long Short-Term Memory) model provides the most accurate and unbiased predictions across all protected groups. We further draw concrete connections between interpretability methods and fairness metrics by showing how feature importance from interpretability methods can be beneficial in quantifying potential disparities in mortality predictors.

Research Story Tip: AI and Deep Learning Can Analyze 'Rash Selfies' for Better Lyme Disease Detection


A report on the findings was published in the October 2020 issue of the journal Computers in Biology and Medicine. APL scientists developed and tested several deep learning computer models to accurately pick out EM from other dermatological conditions and normal skin. The DL models were "trained" to discern the appearance of EM using images of non-EM rashes and normal skin available in the public domain, and clinical photos of patients with EM provided by the Johns Hopkins University Lyme Disease Research Center and the Lyme Disease Biobank, part of the Johns Hopkins University School of Medicine's Division of Rheumatology. There are more than 300,000 new cases of Lyme disease annually in the United States and treatment is most effective if it is caught early. Misdiagnosis, especially in the disease's initial stages, is common because of several challenges.

Industry News


Find here a listing of the latest industry news in genomics, genetics, precision medicine, and beyond. Updates are provided on a monthly basis. Sign-Up for our newsletter and never miss out on the latest news and updates. As 2019 came to an end, Veritas Genetics struggled to get funding due to concerns it had previously taken money from China. It was forced to cease US operations and is in talks with potential buyers. The GenomeAsia 100K Project announced its pilot phase with hopes to tackle the underrepresentation of non-Europeans in human genetic studies and enable genetic discoveries across Asia. Veritas Genetics, the start-up that can sequence a human genome for less than $600, ceases US operations and is in talks with potential buyers Veritas Genetics ceases US operations but will continue Veritas Europe and Latin America. It had trouble raising funding due to previous China investments and is looking to be acquired. Illumina loses DNA sequencing patents The European Patent ...

Modelling of Sickle Cell Anemia Patients Response to Hydroxyurea using Artificial Neural Networks Artificial Intelligence

Hydroxyurea (HU) has been shown to be effective in alleviating the symptoms of Sickle Cell Anemia disease. While Hydroxyurea reduces the complications associated with Sickle Cell Anemia in some patients, others do not benefit from this drug and experience deleterious effects since it is also a chemotherapeutic agent. Therefore, to whom, should the administration of HU be considered as a viable option, is the main question asked by the responsible physician. We address this question by developing modeling techniques that can predict a patient's response to HU and therefore spare the non-responsive patients from the unnecessary effects of HU on the values of 22 parameters that can be obtained from blood samples in 122 patients. Using this data, we developed Deep Artificial Neural Network models that can predict with 92.6% accuracy, the final HbF value of a subject after undergoing HU therapy. Our current studies are focussing on forecasting a patient's HbF response, 30 days ahead of time.

Semi-Supervised Natural Language Approach for Fine-Grained Classification of Medical Reports Machine Learning

Although machine learning has become a powerful tool to augment doctors in clinical analysis, the immense amount of labeled data that is necessary to train supervised learning approaches burdens each development task as time and resource intensive. The vast majority of dense clinical information is stored in written reports, detailing pertinent patient information. The challenge with utilizing natural language data for standard model development is due to the complex nature of the modality. In this research, a model pipeline was developed to utilize an unsupervised approach to train an encoder-language model, a recurrent network, to generate document encodings; which then can be used as features passed into a decoder-classifier model that requires magnitudes less labeled data than previous approaches to differentiate between fine-grained disease classes accurately. The language model was trained on unlabeled radiology reports from the Massachusetts General Hospital Radiology Department (n=218,159) and terminated with a loss of 1.62. The classification models were trained on three labeled datasets of head CT studies of reported patients, presenting large vessel occlusion (n=1403), acute ischemic strokes (n=331), and intracranial hemorrhage (n=4350), to identify a variety of different findings directly from the radiology report data; resulting in AUCs of 0.98, 0.95, and 0.99, respectively, for the large vessel occlusion, acute ischemic stroke, and intracranial hemorrhage datasets. The output encodings are able to be used in conjunction with imaging data, to create models that can process a multitude of different modalities. The ability to automatically extract relevant features from textual data allows for faster model development and integration of textual modality, overall, allowing clinical reports to become a more viable input for more encompassing and accurate deep learning models.

Neural network system has achieved remarkable accuracy in detecting brain hemorrhages


Deep learning and its applications have grown in recent years. Recently, researchers from ETH Zurich used the technique to study dark matter in an industry first. Now, a team working with the University of California, Berkeley and the University of California, San Francisco (UCSF) School of Medicine have trained a convolutional neural network dubbed "PatchFCN" that detects brain hemorrhages with remarkable accuracy. In a paper titled "Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning", the team claims that: We used a single-stage, end-to-end, fully convolutional neural network to achieve accuracy levels comparable to that of highly trained radiologists, including both identification and localization of abnormalities that are missed by radiologists. The team achieved an accuracy of 99 percent, which is the highest recorded accuracy to date for detecting brain hemorrhages.

Multi-label Detection and Classification of Red Blood Cells in Microscopic Images Machine Learning

Cell detection and cell type classification from biomedical images play an important role for high-throughput imaging and various clinical application. While classification of single cell sample can be performed with standard computer vision and machine learning methods, analysis of multi-label samples (region containing congregating cells) is more challenging, as separation of individual cells can be difficult (e.g. touching cells) or even impossible (e.g. overlapping cells). As multi-instance images are common in analyzing Red Blood Cell (RBC) for Sickle Cell Disease (SCD) diagnosis, we develop and implement a multi-instance cell detection and classification framework to address this challenge. The framework firstly trains a region proposal model based on Region-based Convolutional Network (RCNN) to obtain bounding-boxes of regions potentially containing single or multiple cells from input microscopic images, which are extracted as image patches. High-level image features are then calculated from image patches through a pre-trained Convolutional Neural Network (CNN) with ResNet-50 structure. Using these image features inputs, six networks are then trained to make multi-label prediction of whether a given patch contains cells belonging to a specific cell type. As the six networks are trained with image patches consisting of both individual cells and touching/overlapping cells, they can effectively recognize cell types that are presented in multi-instance image samples. Finally, for the purpose of SCD testing, we train another machine learning classifier to predict whether the given image patch contains abnormal cell type based on outputs from the six networks. Testing result of the proposed framework shows that it can achieve good performance in automatic cell detection and classification.

Reinforcement Learning in Healthcare: A Survey Artificial Intelligence

As a subfield of machine learning, \emph{reinforcement learning} (RL) aims at empowering one's capabilities in behavioural decision making by using interaction experience with the world and an evaluative feedback. Unlike traditional supervised learning methods that usually rely on one-shot, exhaustive and supervised reward signals, RL tackles with sequential decision making problems with sampled, evaluative and delayed feedback simultaneously. Such distinctive features make RL technique a suitable candidate for developing powerful solutions in a variety of healthcare domains, where diagnosing decisions or treatment regimes are usually characterized by a prolonged and sequential procedure. This survey will discuss the broad applications of RL techniques in healthcare domains, in order to provide the research community with systematic understanding of theoretical foundations, enabling methods and techniques, existing challenges, and new insights of this emerging paradigm. By first briefly examining theoretical foundations and key techniques in RL research from efficient and representational directions, we then provide an overview of RL applications in a variety of healthcare domains, ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis from both unstructured and structured clinical data, as well as many other control or scheduling domains that have infiltrated many aspects of a healthcare system. Finally, we summarize the challenges and open issues in current research, and point out some potential solutions and directions for future research.

Predicting Treatment Initiation from Clinical Time Series Data via Graph-Augmented Time-Sensitive Model Machine Learning

Many computational models were proposed to extract temporal patterns from clinical time series for each patient and among patient group for predictive healthcare. However, the common relations among patients (e.g., share the same doctor) were rarely considered. In this paper, we represent patients and clinicians relations by bipartite graphs addressing for example from whom a patient get a diagnosis. We then solve for the top eigenvectors of the graph Laplacian, and include the eigenvectors as latent representations of the similarity between patient-clinician pairs into a time-sensitive prediction model. We conducted experiments using real-world data to predict the initiation of first-line treatment for Chronic Lymphocytic Leukemia (CLL) patients. Results show that relational similarity can improve prediction over multiple baselines, for example a 5% incremental over long-short term memory baseline in terms of area under precision-recall curve.