Supervision by Denoising for Medical Image Segmentation

Young, Sean I., Dalca, Adrian V., Ferrante, Enzo, Golland, Polina, Metzler, Christopher A., Fischl, Bruce, Iglesias, Juan Eugenio

arXiv.org Artificial Intelligence 

Abstract--Learning-based image reconstruction models, such as those based on the U-Net, require a large set of labeled images if good generalization is to be guaranteed. In some imaging domains, however, labeled data with pixel-or voxel-level label accuracy are scarce due to the cost of acquiring them. This problem is exacerbated further in domains like medical imaging, where there is no single ground truth label, resulting in large amounts of repeat variability in the labels. Therefore, training reconstruction networks to generalize better by learning from both labeled and unlabeled examples (called semi-supervised learning) is problem of practical and theoretical interest. However, traditional semi-supervised learning methods for image reconstruction often necessitate handcrafting a differentiable regularizer specific to some given imaging problem, which can be extremely time-consuming. In this work, we propose "supervision by denoising" (SUD), a framework to supervise reconstruction models using their own denoised output as labels. SUD unifies stochastic averaging and spatial denoising techniques under a spatio-temporal denoising framework and alternates denoising and model weight update steps in an optimization framework for semi-supervision. As example applications, we apply SUD to two problems from biomedical imaging--anatomical brain reconstruction (3D) and cortical parcellation (2D)--to demonstrate a significant improvement in reconstruction over supervised-only and ensembling baselines. While reconstruction models such as those based on the reconstruction network has proved extremely useful for U-Net [5] typically outperform handcrafted models in many imposing topological or spatial priors on the reconstruction imaging problems, they can involve millions of parameters [18], [19] and semi-supervised learning (SSL). SSL methods and, as a result, have a tendency to overfit training data and based on regularization suffer neither from limited diversity generalize poorly to previously unseen images at test time-- of augmented data nor domain gaps resulting from training a problem also exacerbated by distribution shift [6].