Goto

Collaborating Authors

 emphysema


What limits performance of weakly supervised deep learning for chest CT classification?

Tushar, Fakrul Islam, D'Anniballe, Vincent M., Rubin, Geoffrey D., Lo, Joseph Y.

arXiv.org Artificial Intelligence

Weakly supervised learning with noisy data has drawn attention in the medical imaging community due to the sparsity of high-quality disease labels. However, little is known about the limitations of such weakly supervised learning and the effect of these constraints on disease classification performance. In this paper, we test the effects of such weak supervision by examining model tolerance for three conditions. First, we examined model tolerance for noisy data by incrementally increasing error in the labels within the training data. Second, we assessed the impact of dataset size by varying the amount of training data. Third, we compared performance differences between binary and multi-label classification. Results demonstrated that the model could endure up to 10% added label error before experiencing a decline in disease classification performance. Disease classification performance steadily rose as the amount of training data was increased for all disease classes, before experiencing a plateau in performance at 75% of training data. Last, the binary model outperformed the multilabel model in every disease category. However, such interpretations may be misleading, as the binary model was heavily influenced by co-occurring diseases and may not have learned the specific features of the disease in the image. In conclusion, this study may help the medical imaging community understand the benefits and risks of weak supervision with noisy labels. Such studies demonstrate the need to build diverse, large-scale datasets and to develop explainable and responsible AI.


DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images

Yu, Ke, Sun, Li, Chen, Junxiang, Reynolds, Max, Chaudhary, Tigmanshu, Batmanghelich, Kayhan

arXiv.org Artificial Intelligence

Large-scale volumetric medical images with annotation are rare, costly, and time prohibitive to acquire. Self-supervised learning (SSL) offers a promising pre-training and feature extraction solution for many downstream tasks, as it only uses unlabeled data. Recently, SSL methods based on instance discrimination have gained popularity in the medical imaging domain. However, SSL pre-trained encoders may use many clues in the image to discriminate an instance that are not necessarily disease-related. Moreover, pathological patterns are often subtle and heterogeneous, requiring the ability of the desired method to represent anatomy-specific features that are sensitive to abnormal changes in different body parts. In this work, we present a novel SSL framework, named DrasCLR, for 3D medical imaging to overcome these challenges. We propose two domain-specific contrastive learning strategies: one aims to capture subtle disease patterns inside a local anatomical region, and the other aims to represent severe disease patterns that span larger regions. We formulate the encoder using conditional hyper-parameterized network, in which the parameters are dependant on the anatomical location, to extract anatomically sensitive features. Extensive experiments on large-scale computer tomography (CT) datasets of lung images show that our method improves the performance of many downstream prediction and segmentation tasks. The patient-level representation improves the performance of the patient survival prediction task. We show how our method can detect emphysema subtypes via dense prediction. We demonstrate that fine-tuning the pre-trained model can significantly reduce annotation efforts without sacrificing emphysema detection accuracy. Our ablation study highlights the importance of incorporating anatomical context into the SSL framework.


Integrating AI Into Radiology Practice to Enhance Lung Disease Prediction and Diagnosis

#artificialintelligence

Lung cancer continues to be the leading cause of cancer-related deaths globally, with 5-year survival rates still hovering around 20%. Tobacco use continues to be the major risk factor for lung cancer. However, in nonsmokers, exposure to some industrial toxins such as arsenic, certain organic compounds, radon, asbestos, radiation, air pollution, and environmental tobacco smoke also increases the chance of developing lung cancer. Although the incidence of these primary risk factors is decreasing, the implementation of comprehensive tobacco control and the reduction of occupational chemical exposure are not the only measures for decreasing lung cancer mortality. Artificial intelligence has been shown to be beneficial in the discovery of prognostic biomarkers for lung cancer diagnosis, treatment, and response evaluation, putting it at the forefront of the next phase of personalized medicine.


Training Deep Learning models with small datasets

Romero, Miguel, Interian, Yannet, Solberg, Timothy, Valdes, Gilmer

arXiv.org Machine Learning

Miguel Romero BSc 1, Yannet Interian PhD 1, Timothy Solberg PhD 2, and Gilmer Valdes PhD 2 1 Master of Science in Data Science, University of San Francisco, San Francisco, CA 2 Department of Radiation Oncology, University of California San Francisco, San Francisco, CA December 17, 2019 Abstract The growing use of Machine Learning has produced significant advances in many fields. For image-based tasks, however, the use of deep learning remains challenging in small datasets. In this article, we review, evaluate and compare current state of the art techniques in training neural networks to elucidate which techniques work best for small datasets. We further propose a path forward for the improvement of model accuracy in medical imaging applications. We observed best results from: one cycle training, discriminative learning rates with gradual freezing and parameter modification after transfer learning. We also established that when datasets are small, transfer learning plays an important role beyond parameter initialization by reusing previously learned features. Surprisingly we observed that there is little advantage in using pre-trained networks in images from another part of the body compared to Imagenet. On the contrary, if images from the same part of the body are available then transfer learning can produce a significant improvement in performance with as little as 50 images in the training data. 1 Introduction The use of machine learning in medical imaging, radiation theranostics and medical physics applications has created tremendous opportunity with research that encompasses: quality assurance [1, 2, 3, 4, 5, 6], outcome prediction [7, 8, 9, 10, 11, 12, 13], segmentation [14, 15, 16, 17] or dosimetric prediction Equal contribution authors. Partially supported by the wicklow AI and medical research initiative at the Data institute.


Disease Detection in Weakly Annotated Volumetric Medical Images using a Convolutional LSTM Network

Braman, Nathaniel, Beymer, David, Dehghan, Ehsan

arXiv.org Machine Learning

We explore a solution for learning disease signatures from weakly, yet easily obtainable, annotated volumetric medical imaging data by analyzing 3D volumes as a sequence of 2D images. We demonstrate the performance of our solution in the detection of emphysema in lung cancer screening low-dose CT images. Our approach utilizes convolutional long short-term memory (LSTM) to "scan" sequentially through an imaging volume for the presence of disease in a portion of scanned region. This framework allowed effective learning given only volumetric images and binary disease labels, thus enabling training from a large dataset of 6,631 un-annotated image volumes from 4,486 patients. When evaluated in a testing set of 2,163 volumes from 2,163 patients, our model distinguished emphysema with area under the receiver operating characteristic curve (AUC) of .83. This approach was found to outperform 2D convolutional neural networks (CNN) implemented with various multiple-instance learning schemes (AUC=0.69-0.76) and a 3D CNN (AUC=.77).


Classification of COPD with Multiple Instance Learning

Cheplygina, Veronika, Sørensen, Lauge, Tax, David M. J., Pedersen, Jesper Holst, Loog, Marco, de Bruijne, Marleen

arXiv.org Machine Learning

Chronic obstructive pulmonary disease (COPD) is a lung disease where early detection benefits the survival rate. COPD can be quantified by classifying patches of computed tomography images, and combining patch labels into an overall diagnosis for the image. As labeled patches are often not available, image labels are propagated to the patches, incorrectly labeling healthy patches in COPD patients as being affected by the disease. We approach quantification of COPD from lung images as a multiple instance learning (MIL) problem, which is more suitable for such weakly labeled data. We investigate various MIL assumptions in the context of COPD and show that although a concept region with COPD-related disease patterns is present, considering the whole distribution of lung tissue patches improves the performance. The best method is based on averaging instances and obtains an AUC of 0.742, which is higher than the previously reported best of 0.713 on the same dataset. Using the full training set further increases performance to 0.776, which is significantly higher (DeLong test) than previous results.


Report 81 25 A Simple Event Driven Stanford . H. Penny Nil

AI Classics

Each example in this series illustrates a different set of features of AGE. AGE Example Series: Number 1 describes a beginner's program.


Report 78-19 A Physiological Rule-Based System for S Stanford -- KSL Interpreting Pulmonary Function Test Results

AI Classics

PUFF is now in routine use in Presbyterian Hospital, Pacific Medical Center (PMC), in San Francisco. The program produces a report, intended for patient records, that explains the clinical significance of measured quantitative test results and gives a diagnosis of the presence and severity of pulmonary disease in terms of the measured data, referral diagnosis, and patient history. "Rules", or statements of the form "IF condition THEN conclusion ", are used by the physiologist and the computer system to specify the system operation. The sequence of rules used to interpret the case also specifies a line of reasoning about the case, or the detailed explanation of the interpretation of the case. The use of rules for this type of knowledge based system is taken from the results of applied Artificial Intelligence research. In a 144 case prospective evaluation, there was a 91% overall rate of agreement between the rule based system diagnoses and the diagnoses of the designing physiologist; there was a 89% rate of agreement between the system diagnoses and diagnoses of a second independent physiologist.