At Qure, we regularly work on segmentation and object detection problems and we were therefore interested in reviewing the current state of the art. In this post, I review the literature on semantic segmentation. Although the results are not directly applicable to medical images, I review these papers because research on the natural images is much more mature than that of medical images. Post is organized as follows: I first explain the semantic segmentation problem, give an overview of the approaches and summarize a few interesting papers. In a later post, I'll explain why medical images are different from natural images and examine how the approaches from this review fare on a dataset representative of medical images.
Semantic segmentation remains a computationally intensive algorithm for embedded deployment even with the rapid growth of computation power. Thus efficient network design is a critical aspect especially for applications like automated driving which requires real-time performance. Recently, there has been a lot of research on designing efficient encoders that are mostly task agnostic. Unlike image classification and bounding box object detection tasks, decoders are computationally expensive as well for semantic segmentation task. In this work, we focus on efficient design of the segmentation decoder and assume that an efficient encoder is already designed to provide shared features for a multi-task learning system. We design a novel efficient non-bottleneck layer and a family of decoders which fit into a small run-time budget using VGG10 as efficient encoder. We demonstrate in our dataset that experimentation with various design choices led to an improvement of 10\% from a baseline performance.
In neuroimaging, multiple automated methods have been developed for various problems, such as brain extraction, segmentation of anatomical regions of interest (ROIs), white matter lesion (WML) segmentation and segmentation of brain tumor sub-regions. Importantly, each of these problems have their own specific challenges, mainly due to variations in image modalities and imaging signatures that best characterize target regions. These variations motivated development of a large number of distinct task-specific segmentation methods (Kalavathi P, 2016; Anbeek et al., 2004; Eugenio Iglesias and Sabuncu, 2014; Gordillo et al., 2013; Despotovic et al., 2015). Machine learning has played a key role in enabling novel methods that achieved accuracy comparable to, or surpassing human raters. In the commonly used supervised learning framework, examples with ground-truth labels are presented to the learning algorithm in order to construct a model that learns imaging patterns that characterize the target segmentations.
We focus on the very challenging task of semantic segmentation for autonomous driving system. It must deliver decent semantic segmentation result for traffic critical objects real-time. In this paper, we propose a very efficient yet powerful deep neural network for driving scene semantic segmentation termed as Driving Segmentation Network (DSNet). DSNet achieves state-of-the-art balance between accuracy and inference speed through efficient units and architecture design inspired by ShuffleNet V2 and ENet. More importantly, DSNet highlights classes most critical with driving decision making through our novel Driving Importance-weighted Loss. We evaluate DSNet on Cityscapes dataset, our DSNet achieves 71.8% mean Intersection-over-Union (IoU) on validation set and 69.3% on test set. Class-wise IoU scores show that Driving Importance-weighted Loss could improve most driving critical classes by a large margin. Compared with ENet, DSNet is 18.9% more accurate and 1.1+ times faster which implies great potential for autonomous driving application.
Deep Convolutional Neural Networks (DCNNs) have achieved remarkable success in various Computer Vision applications. Like others, the task of semantic segmentation is not an exception to this trend. This piece provides an introduction to Semantic Segmentation with a hands-on TensorFlow implementation. Regular image classification DCNNs have similar structure. These models take images as input and output a single value representing the category of that image. Usually, classification DCNNs have four main operations. Note that in this setup, we categorize an image as a whole.