Parikh, Ujas
An efficient deep neural network to find small objects in large 3D images
Park, Jungkyu, Chłędowski, Jakub, Jastrzębski, Stanisław, Witowski, Jan, Xu, Yanqi, Du, Linda, Gaddam, Sushma, Kim, Eric, Lewin, Alana, Parikh, Ujas, Plaunova, Anastasia, Chen, Sardius, Millet, Alexandra, Park, James, Pysarenko, Kristine, Patel, Shalin, Goldberg, Julia, Wegener, Melanie, Moy, Linda, Heacock, Laura, Reig, Beatriu, Geras, Krzysztof J.
3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alternative, a neural network that enables efficient classification of full-resolution 3D medical images. Compared to off-the-shelf convolutional neural networks, our network, 3D Globally-Aware Multiple Instance Classifier (3D-GMIC), uses 77.98%-90.05% less GPU memory and 91.23%-96.02% less computation. While it is trained only with image-level labels, without segmentation labels, it explains its predictions by providing pixel-level saliency maps. On a dataset collected at NYU Langone Health, including 85,526 patients with full-field 2D mammography (FFDM), synthetic 2D mammography, and 3D mammography, 3D-GMIC achieves an AUC of 0.831 (95% CI: 0.769-0.887) in classifying breasts with malignant findings using 3D mammography. This is comparable to the performance of GMIC on FFDM (0.816, 95% CI: 0.737-0.878) and synthetic 2D (0.826, 95% CI: 0.754-0.884), which demonstrates that 3D-GMIC successfully classified large 3D images despite focusing computation on a smaller percentage of its input compared to GMIC. Therefore, 3D-GMIC identifies and utilizes extremely small regions of interest from 3D images consisting of hundreds of millions of pixels, dramatically reducing associated computational challenges. 3D-GMIC generalizes well to BCS-DBT, an external dataset from Duke University Hospital, achieving an AUC of 0.848 (95% CI: 0.798-0.896).
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
Wu, Nan, Phang, Jason, Park, Jungkyu, Shen, Yiqiu, Huang, Zhe, Zorin, Masha, Jastrzębski, Stanisław, Févry, Thibault, Katsnelson, Joe, Kim, Eric, Wolfson, Stacey, Parikh, Ujas, Gaddam, Sushma, Lin, Leng Leng Young, Ho, Kara, Weinstein, Joshua D., Reig, Beatriu, Gao, Yiming, Toth, Hildegard, Pysarenko, Kristine, Lewin, Alana, Lee, Jiyon, Airola, Krystal, Mema, Eralda, Chung, Stephanie, Hwang, Esther, Samreen, Naziya, Kim, S. Gene, Heacock, Laura, Moy, Linda, Cho, Kyunghyun, Geras, Krzysztof J.
This paper makes several contributions. Among these, only 20-40% yield a diagnosis of cancer (5). The authors declare no conflict of interest. To whom correspondence should be addressed. Work done while visiting NYU. In the reader study, we compared the performance of our best model to that of radiologists and found our model to be as accurate as radiologists both in terms of area under ROC curve (AUC) and area under precision-recall curve (PRAUC). We also found that a hybrid model, taking the average of the probabilities of malignancy predicted by a radiologist and by our neural network, yields more accurate predictions than either of the two separately. This suggests that our network and radiologists learned different aspects of the task and that our model could be effective as a tool providing radiologists a second reader. With this contribution, research groups that are working on improving screening mammography, which may not have access to a large training dataset like ours, will be able to directly use our model in their research or to use our pretrained weights as an initialization to train models with less data. By making our models public, we invite other groups to validate our results and test their robustness to shifts in the data distribution. The dataset includes 229,426 digital screening mammography exams (1,001,093 images) from 141,473 patients. For each breast, we assign two binary labels: from biopsies. We have 5,832 exams with at least one biopsy the absence/presence of malignant findings in a breast, performed within 120 days of the screening mammogram. With Among these, biopsies confirmed malignant findings for 985 left and right breasts, each exam has a total of four binary (8.4%) breasts and benign findings for 5,556 (47.6%) breasts.
High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks
Geras, Krzysztof J., Wolfson, Stacey, Shen, Yiqiu, Wu, Nan, Kim, S. Gene, Kim, Eric, Heacock, Laura, Parikh, Ujas, Moy, Linda, Cho, Kyunghyun
Advances in deep learning for natural images have prompted a surge of interest in applying similar techniques to medical images. The majority of the initial attempts focused on replacing the input of a deep convolutional neural network with a medical image, which does not take into consideration the fundamental differences between these two types of images. Specifically, fine details are necessary for detection in medical images, unlike in natural images where coarse structures matter most. This difference makes it inadequate to use the existing network architectures developed for natural images, because they work on heavily downscaled images to reduce the memory requirements. This hides details necessary to make accurate predictions. Additionally, a single exam in medical imaging often comes with a set of views which must be fused in order to reach a correct conclusion. In our work, we propose to use a multi-view deep convolutional neural network that handles a set of high-resolution medical images. We evaluate it on large-scale mammography-based breast cancer screening (BI-RADS prediction) using 886,000 images. We focus on investigating the impact of the training set size and image size on the prediction accuracy. Our results highlight that performance increases with the size of training set, and that the best performance can only be achieved using the original resolution. In the reader study, performed on a random subset of the test set, we confirmed the efficacy of our model, which achieved performance comparable to a committee of radiologists when presented with the same data.