Taha, Ahmed
Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge
LaBella, Dominic, Baid, Ujjwal, Khanna, Omaditya, McBurney-Lin, Shan, McLean, Ryan, Nedelec, Pierre, Rashid, Arif, Tahon, Nourel Hoda, Altes, Talissa, Bhalerao, Radhika, Dhemesh, Yaseen, Godfrey, Devon, Hilal, Fathi, Floyd, Scott, Janas, Anastasia, Kazerooni, Anahita Fathi, Kirkpatrick, John, Kent, Collin, Kofler, Florian, Leu, Kevin, Maleki, Nazanin, Menze, Bjoern, Pajot, Maxence, Reitman, Zachary J., Rudie, Jeffrey D., Saluja, Rachit, Velichko, Yury, Wang, Chunhao, Warman, Pranav, Adewole, Maruf, Albrecht, Jake, Anazodo, Udunna, Anwar, Syed Muhammad, Bergquist, Timothy, Chen, Sully Francis, Chung, Verena, Conte, Gian-Marco, Dako, Farouk, Eddy, James, Ezhov, Ivan, Khalili, Nastaran, Iglesias, Juan Eugenio, Jiang, Zhifan, Johanson, Elaine, Van Leemput, Koen, Li, Hongwei Bran, Linguraru, Marius George, Liu, Xinyang, Mahtabfar, Aria, Meier, Zeke, Moawad, Ahmed W., Mongan, John, Piraud, Marie, Shinohara, Russell Takeshi, Wiggins, Walter F., Abayazeed, Aly H., Akinola, Rachel, Jakab, András, Bilello, Michel, de Verdier, Maria Correia, Crivellaro, Priscila, Davatzikos, Christos, Farahani, Keyvan, Freymann, John, Hess, Christopher, Huang, Raymond, Lohmann, Philipp, Moassefi, Mana, Pease, Matthew W., Vollmuth, Phillipp, Sollmann, Nico, Diffley, David, Nandolia, Khanak K., Warren, Daniel I., Hussain, Ali, Fehringer, Pascal, Bronstein, Yulia, Deptula, Lisa, Stein, Evan G., Taherzadeh, Mahsa, de Oliveira, Eduardo Portela, Haughey, Aoife, Kontzialis, Marinos, Saba, Luca, Turner, Benjamin, Brüßeler, Melanie M. T., Ansari, Shehbaz, Gkampenis, Athanasios, Weiss, David Maximilian, Mansour, Aya, Shawali, Islam H., Yordanov, Nikolay, Stein, Joel M., Hourani, Roula, Moshebah, Mohammed Yahya, Abouelatta, Ahmed Magdy, Rizvi, Tanvir, Willms, Klara, Martin, Dann C., Okar, Abdullah, D'Anna, Gennaro, Taha, Ahmed, Sharifi, Yasaman, Faghani, Shahriar, Kite, Dominic, Pinho, Marco, Haider, Muhammad Ammar, Aristizabal, Alejandro, Karargyris, Alexandros, Kassem, Hasan, Pati, Sarthak, Sheller, Micah, Alonso-Basanta, Michelle, Villanueva-Meyer, Javier, Rauschecker, Andreas M., Nada, Ayman, Aboian, Mariam, Flanders, Adam E., Wiestler, Benedikt, Bakas, Spyridon, Calabrese, Evan
We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning automated segmentation models using image data from the largest multi-institutional systematically expert annotated multilabel multi-sequence meningioma MRI dataset to date, which included 1000 training set cases, 141 validation set cases, and 283 hidden test set cases. Each case included T2, T2/FLAIR, T1, and T1Gd brain MRI sequences with associated tumor compartment labels delineating enhancing tumor, non-enhancing tumor, and surrounding non-enhancing T2/FLAIR hyperintensity. Participant automated segmentation models were evaluated and ranked based on a scoring system evaluating lesion-wise metrics including dice similarity coefficient (DSC) and 95% Hausdorff Distance. The top ranked team had a lesion-wise median dice similarity coefficient (DSC) of 0.976, 0.976, and 0.964 for enhancing tumor, tumor core, and whole tumor, respectively and a corresponding average DSC of 0.899, 0.904, and 0.871, respectively. These results serve as state-of-the-art benchmarks for future pre-operative meningioma automated segmentation algorithms. Additionally, we found that 1286 of 1424 cases (90.3%) had at least 1 compartment voxel abutting the edge of the skull-stripped image edge, which requires further investigation into optimal pre-processing face anonymization steps.
Problems and shortcuts in deep learning for screening mammography
Tsue, Trevor, Mombourquette, Brent, Taha, Ahmed, Matthews, Thomas Paul, Vu, Yen Nhi Truong, Su, Jason
This work reveals undiscovered challenges in the performance and generalizability of deep learning models. We (1) identify spurious shortcuts and evaluation issues that can inflate performance and (2) propose training and analysis methods to address them. We trained an AI model to classify cancer on a retrospective dataset of 120,112 US exams (3,467 cancers) acquired from 2008 to 2017 and 16,693 UK exams (5,655 cancers) acquired from 2011 to 2015. We evaluated on a screening mammography test set of 11,593 US exams (102 cancers; 7,594 women; age 57.1 \pm 11.0) and 1,880 UK exams (590 cancers; 1,745 women; age 63.3 \pm 7.2). A model trained on images of only view markers (no breast) achieved a 0.691 AUC. The original model trained on both datasets achieved a 0.945 AUC on the combined US+UK dataset but paradoxically only 0.838 and 0.892 on the US and UK datasets, respectively. Sampling cancers equally from both datasets during training mitigated this shortcut. A similar AUC paradox (0.903) occurred when evaluating diagnostic exams vs screening exams (0.862 vs 0.861, respectively). Removing diagnostic exams during training alleviated this bias. Finally, the model did not exhibit the AUC paradox over scanner models but still exhibited a bias toward Selenia Dimension (SD) over Hologic Selenia (HS) exams. Analysis showed that this AUC paradox occurred when a dataset attribute had values with a higher cancer prevalence (dataset bias) and the model consequently assigned a higher probability to these attribute values (model bias). Stratification and balancing cancer prevalence can mitigate shortcuts during evaluation. Dataset and model bias can introduce shortcuts and the AUC paradox, potentially pervasive issues within the healthcare AI space. Our methods can verify and mitigate shortcuts while providing a clear understanding of performance.
Knowledge Evolution in Neural Networks
Taha, Ahmed, Shrivastava, Abhinav, Davis, Larry
Deep learning relies on the availability of a large corpus of data (labeled or unlabeled). Thus, one challenging unsettled question is: how to train a deep network on a relatively small dataset? To tackle this question, we propose an evolution-inspired training approach to boost performance on relatively small datasets. The knowledge evolution (KE) approach splits a deep network into two hypotheses: the fit-hypothesis and the reset-hypothesis. We iteratively evolve the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for multiple generations. This approach not only boosts performance, but also learns a slim network with a smaller inference cost. KE integrates seamlessly with both vanilla and residual convolutional networks. KE reduces both overfitting and the burden for data collection. We evaluate KE on various network architectures and loss functions. We evaluate KE using relatively small datasets (e.g., CUB-200) and randomly initialized deep networks. KE achieves an absolute 21% improvement margin on a state-of-the-art baseline. This performance improvement is accompanied by a relative 73% reduction in inference cost. KE achieves state-of-the-art results on classification and metric learning benchmarks. Code available at http://bit.ly/3uLgwYb