Deprest, Thomas
A Dempster-Shafer approach to trustworthy AI with application to fetal brain MRI segmentation
Fidon, Lucas, Aertsen, Michael, Kofler, Florian, Bink, Andrea, David, Anna L., Deprest, Thomas, Emam, Doaa, Guffens, Frédéric, Jakab, András, Kasprian, Gregor, Kienast, Patric, Melbourne, Andrew, Menze, Bjoern, Mufti, Nada, Pogledic, Ivana, Prayer, Daniela, Stuempflen, Marlene, Van Elslander, Esther, Ourselin, Sébastien, Deprest, Jan, Vercauteren, Tom
Deep learning models for medical image segmentation can fail unexpectedly and spectacularly for pathological cases and images acquired at different centers than training images, with labeling errors that violate expert knowledge. Such errors undermine the trustworthiness of deep learning models for medical image segmentation. Mechanisms for detecting and correcting such failures are essential for safely translating this technology into clinics and are likely to be a requirement of future regulations on artificial intelligence (AI). In this work, we propose a trustworthy AI theoretical framework and a practical system that can augment any backbone AI system using a fallback method and a fail-safe mechanism based on Dempster-Shafer theory. Our approach relies on an actionable definition of trustworthy AI. Our method automatically discards the voxel-level labeling predicted by the backbone AI that violate expert knowledge and relies on a fallback for those voxels. We demonstrate the effectiveness of the proposed trustworthy AI approach on the largest reported annotated dataset of fetal MRI consisting of 540 manually annotated fetal brain 3D T2w MRIs from 13 centers. Our trustworthy AI method improves the robustness of a state-of-the-art backbone AI for fetal brain MRIs acquired across various centers and for fetuses with various brain abnormalities.
Distributionally Robust Deep Learning using Hardness Weighted Sampling
Fidon, Lucas, Aertsen, Michael, Deprest, Thomas, Emam, Doaa, Guffens, Frédéric, Mufti, Nada, Van Elslander, Esther, Schwartz, Ernst, Ebner, Michael, Prayer, Daniela, Kasprian, Gregor, David, Anna L., Melbourne, Andrew, Ourselin, Sébastien, Deprest, Jan, Langs, Georg, Vercauteren, Tom
Limiting failures of machine learning systems is of paramount importance for safety-critical applications. In order to improve the robustness of machine learning systems, Distributionally Robust Optimization (DRO) has been proposed as a generalization of Empirical Risk Minimization (ERM). However, its use in deep learning has been severely restricted due to the relative inefficiency of the optimizers available for DRO in comparison to the wide-spread variants of Stochastic Gradient Descent (SGD) optimizers for ERM. We propose SGD with hardness weighted sampling, a principled and efficient optimization method for DRO in machine learning that is particularly suited in the context of deep learning. Similar to a hard example mining strategy in practice, the proposed algorithm is straightforward to implement and computationally as efficient as SGD-based optimizers used for deep learning, requiring minimal overhead computation. In contrast to typical ad hoc hard mining approaches, we prove the convergence of our DRO algorithm for over-parameterized deep learning networks with ReLU activation and a finite number of layers and parameters. Our experiments on fetal brain 3D MRI segmentation and brain tumor segmentation in MRI demonstrate the feasibility and the usefulness of our approach. Using our hardness weighted sampling for training a state-of-the-art deep learning pipeline leads to improved robustness to anatomical variabilities in automatic fetal brain 3D MRI segmentation using deep learning and to improved robustness to the image protocol variations in brain tumor segmentation. Our code is available at https://github.com/LucasFidon/HardnessWeightedSampler.
Distributionally Robust Segmentation of Abnormal Fetal Brain 3D MRI
Fidon, Lucas, Aertsen, Michael, Mufti, Nada, Deprest, Thomas, Emam, Doaa, Guffens, Frédéric, Schwartz, Ernst, Ebner, Michael, Prayer, Daniela, Kasprian, Gregor, David, Anna L., Melbourne, Andrew, Ourselin, Sébastien, Deprest, Jan, Langs, Georg, Vercauteren, Tom
The performance of deep neural networks typically increases with the number of training images. However, not all images have the same importance towards improved performance and robustness. In fetal brain MRI, abnormalities exacerbate the variability of the developing brain anatomy compared to non-pathological cases. A small number of abnormal cases, as is typically available in clinical datasets used for training, are unlikely to fairly represent the rich variability of abnormal developing brains. This leads machine learning systems trained by maximizing the average performance to be biased toward non-pathological cases. This problem was recently referred to as hidden stratification. To be suited for clinical use, automatic segmentation methods need to reliably achieve high-quality segmentation outcomes also for pathological cases. In this paper, we show that the state-of-the-art deep learning pipeline nnU-Net has difficulties to generalize to unseen abnormal cases. To mitigate this problem, we propose to train a deep neural network to minimize a percentile of the distribution of per-volume loss over the dataset. We show that this can be achieved by using Distributionally Robust Optimization (DRO). DRO automatically reweights the training samples with lower performance, encouraging nnU-Net to perform more consistently on all cases. We validated our approach using a dataset of 368 fetal brain T2w MRIs, including 124 MRIs of open spina bifida cases and 51 MRIs of cases with other severe abnormalities of brain development.