Khan, Atif
NCL-SM: A Fully Annotated Dataset of Images from Human Skeletal Muscle Biopsies
Khan, Atif, Lawless, Conor, Vincent, Amy, Warren, Charlotte, Di Leo, Valeria, Gomes, Tiago, McGough, A. Stephen
Single cell analysis of human skeletal muscle (SM) tissue cross-sections is a fundamental tool for understanding many neuromuscular disorders. For this analysis to be reliable and reproducible, identification of individual fibres within microscopy images (segmentation) of SM tissue should be automatic and precise. Biomedical scientists in this field currently rely on custom tools and general machine learning (ML) models, both followed by labour intensive and subjective manual interventions to fine-tune segmentation. We believe that fully automated, precise, reproducible segmentation is possible by training ML models. However, in this important biomedical domain, there are currently no good quality, publicly available annotated imaging datasets available for ML model training. In this paper we release NCL-SM: a high quality bioimaging dataset of 46 human SM tissue cross-sections from both healthy control subjects and from patients with genetically diagnosed muscle pathology. These images include $>$ 50k manually segmented muscle fibres (myofibres). In addition we also curated high quality myofibre segmentations, annotating reasons for rejecting low quality myofibres and low quality regions in SM tissue images, making these annotations completely ready for downstream analysis. This, we believe, will pave the way for development of a fully automatic pipeline that identifies individual myofibres within images of tissue sections and, in particular, also classifies individual myofibres that are fit for further analysis.
Introducing NCL-SM: A Fully Annotated Dataset of Images from Human Skeletal Muscle Biopsies
Khan, Atif, Lawless, Conor, Vincent, Amy, Warren, Charlotte, Di Leo, Valeria, Gomes, Tiago, McGough, A. Stephen
Single cell analysis of skeletal muscle (SM) tissue is a fundamental tool for understanding many neuromuscular disorders. For this analysis to be reliable and reproducible, identification of individual fibres within microscopy images (segmentation) of SM tissue should be precise. There is currently no tool or pipeline that makes automatic and precise segmentation and curation of images of SM tissue cross-sections possible. Biomedical scientists in this field rely on custom tools and general machine learning (ML) models, both followed by labour intensive and subjective manual interventions to get the segmentation right. We believe that automated, precise, reproducible segmentation is possible by training ML models. However, there are currently no good quality, publicly available annotated imaging datasets available for ML model training. In this paper we release NCL-SM: a high quality bioimaging dataset of 46 human tissue sections from healthy control subjects and from patients with genetically diagnosed muscle pathology. These images include $>$ 50k manually segmented muscle fibres (myofibres). In addition we also curated high quality myofibres and annotated reasons for rejecting low quality myofibres and regions in SM tissue images, making this data completely ready for downstream analysis. This, we believe, will pave the way for development of a fully automatic pipeline that identifies individual myofibres within images of tissue sections and, in particular, also classifies individual myofibres that are fit for further analysis.
Explainable Deep Learning to Profile Mitochondrial Disease Using High Dimensional Protein Expression Data
Khan, Atif, Lawless, Conor, Vincent, Amy E, Pilla, Satish, Ramesh, Sushanth, McGough, A. Stephen
Mitochondrial diseases are currently untreatable due to our limited understanding of their pathology. We study the expression of various mitochondrial proteins in skeletal myofibres (SM) in order to discover processes involved in mitochondrial pathology using Imaging Mass Cytometry (IMC). IMC produces high dimensional multichannel pseudo-images representing spatial variation in the expression of a panel of proteins within a tissue, including subcellular variation. Statistical analysis of these images requires semi-automated annotation of thousands of SMs in IMC images of patient muscle biopsies. In this paper we investigate the use of deep learning (DL) on raw IMC data to analyse it without any manual pre-processing steps, statistical summaries or statistical models. For this we first train state-of-art computer vision DL models on all available image channels, both combined and individually. We observed better than expected accuracy for many of these models. We then apply state-of-the-art explainable techniques relevant to computer vision DL to find the basis of the predictions of these models. Some of the resulting visual explainable maps highlight features in the images that appear consistent with the latest hypotheses about mitochondrial disease progression within myofibres.
Lightweight Mobile Automated Assistant-to-physician for Global Lower-resource Areas
Zhang, Chao, Zhang, Hanxin, Khan, Atif, Kim, Ted, Omoleye, Olasubomi, Abiona, Oluwamayomikun, Lehman, Amy, Olopade, Christopher O., Olopade, Olufunmilayo I., Lopes, Pedro, Rzhetsky, Andrey
Importance: Lower-resource areas in Africa and Asia face a unique set of healthcare challenges: the dual high burden of communicable and non-communicable diseases; a paucity of highly trained primary healthcare providers in both rural and densely populated urban areas; and a lack of reliable, inexpensive internet connections. Objective: To address these challenges, we designed an artificial intelligence assistant to help primary healthcare providers in lower-resource areas document demographic and medical sign/symptom data and to record and share diagnostic data in real-time with a centralized database. Design: We trained our system using multiple data sets, including US-based electronic medical records (EMRs) and open-source medical literature and developed an adaptive, general medical assistant system based on machine learning algorithms. Main outcomes and Measure: The application collects basic information from patients and provides primary care providers with diagnoses and prescriptions suggestions. The application is unique from existing systems in that it covers a wide range of common diseases, signs, and medication typical in lower-resource countries; the application works with or without an active internet connection. Results: We have built and implemented an adaptive learning system that assists trained primary care professionals by means of an Android smartphone application, which interacts with a central database and collects real-time data. The application has been tested by dozens of primary care providers. Conclusions and Relevance: Our application would provide primary healthcare providers in lower-resource areas with a tool that enables faster and more accurate documentation of medical encounters. This application could be leveraged to automatically populate local or national EMR systems.
Privacy Preference Inference via Collaborative Filtering
Khazaei, Taraneh (University of Western Ontario) | Xiao, Lu (University of Western Ontario) | Mercer, Robert E. (Universiy of Western Ontario) | Khan, Atif (InfoTrellis Inc.)
Studies of online social behaviour indicate that users often fail to specify privacy settings that match their privacy behaviour. This issue has caused a dilemma whether to use publicly available data for targeted advertisement and personalization. As a possible approach to manage this dilemma, we propose a collaborative filtering method that exploits homophily to build a probabilistic model. Such a model can indicate the likelihood that a given public profile is meant to be private. Here, we provide the results of an analysis of a set of observable variables to be used in a neighbourhood-based manner. In addition, we establish a social graph augmented with privacy information. Users in the graph are then transformed into a set of latent features, uncovering informative factors to infer privacy preferences.