Unlike classification where the end result of the very deep network is the only important thing, semantic segmentation not only requires discrimination at pixel level but also a mechanism to project the discriminative features learnt at different stages of the encoder onto the pixel space. Different approaches employ different mechanisms as a part of the decoding mechanism. Let's explore the 3 main approaches: The region-based methods generally follow the "segmentation using recognition" pipeline, which first extracts free-form regions from an image and describes them, followed by region-based classification. At test time, the region-based predictions are transformed to pixel predictions, usually by labeling a pixel according to the highest scoring region that contains it. R-CNN (Regions with CNN feature) is one representative work for the region-based methods.
Automatic prostate segmentation in transrectal ultrasound (TRUS) images is of essential importance for image-guided prostate interventions and treatment planning. However, developing such automatic solutions remains very challenging due to the missing/ambiguous boundary and inhomogeneous intensity distribution of the prostate in TRUS, as well as the large variability in prostate shapes. This paper develops a novel 3D deep neural network equipped with attention modules for better prostate segmentation in TRUS by fully exploiting the complementary information encoded in different layers of the convolutional neural network (CNN). Our attention module utilizes the attention mechanism to selectively leverage the multilevel features integrated from different layers to refine the features at each individual layer, suppressing the non-prostate noise at shallow layers of the CNN and increasing more prostate details into features at deep layers. Experimental results on challenging 3D TRUS volumes show that our method attains satisfactory segmentation performance. The proposed attention mechanism is a general strategy to aggregate multi-level deep features and has the potential to be used for other medical image segmentation tasks. The code is publicly available at https://github.com/wulalago/DAF3D.
In this story, SharpMask, by Facebook AI Research (FAIR), is reviewed. Encoder decoder architecture was starting to be common from the year of 2016. By concatenating the feature maps at top down pass to the feature maps at bottom up pass, the performance can be boosted further. SharpMask obtained 2nd place in MS COCO Segmentation challenge and 2nd place in MS COCO Detection challenge. It has been published in 2016 ECCV, with over 200 citations.
Deep convolutional semantic segmentation (DCSS) learning doesn't converge to an optimal local minimum with random parameters initializations; a pre-trained model on the same domain becomes necessary to achieve convergence.In this work, we propose a joint cooperative end-to-end learning method for DCSS. It addresses many drawbacks with existing deep semantic segmentation learning; the proposed approach simultaneously learn both segmentation and classification; taking away the essential need of the pre-trained model for learning convergence. We present an improved inception based architecture with partial attention gating (PAG) over encoder information. The PAG also adds to achieve faster convergence and better accuracy for segmentation task. We will show the effectiveness of this learning on a diabetic retinopathy classification and segmentation dataset.
Kwak, Suha (Daegu Gyeongbuk Institute of Science and Technology (DGIST)) | Hong, Seunghoon (Pohang University of Science and Technology (POSTECH)) | Han, Bohyung (Pohang University of Science and Technology (POSTECH))
We propose a weakly supervised semantic segmentation algorithm based on deep neural networks, which relies on image-level class labels only. The proposed algorithm alternates between generating segmentation annotations and learning a semantic segmentation network using the generated annotations. A key determinant of success in this framework is the capability to construct reliable initial annotations given image-level labels only. To this end, we propose Superpixel Pooling Network (SPN), which utilizes superpixel segmentation of input image as a pooling layout to reflect low-level image structure for learning and inferring semantic segmentation. The initial annotations generated by SPN are then used to learn another neural network that estimates pixel-wise semantic labels. The architecture of the segmentation network decouples semantic segmentation task into classification and segmentation so that the network learns class-agnostic shape prior from the noisy annotations. It turns out that both networks are critical to improve semantic segmentation accuracy. The proposed algorithm achieves outstanding performance in weakly supervised semantic segmentation task compared to existing techniques on the challenging PASCAL VOC 2012 segmentation benchmark.