Have you ever pondered about how seamlessly we can drive a car, and why is it so difficult to get a computer to drive a car? It's because our minds are highly evolved and complex and embedding this complexity into a computer is challenging. And today we will cover a tiny fraction of our journey towards achieving self-driving cars. The task I will be discussing in this post is called semantic segmentation. Segmentation as the name suggests is the act of dividing something into separate parts.
Vora, Sourabh, Lang, Alex H., Helou, Bassam, Beijbom, Oscar
Camera and lidar are important sensor modalities for robotics in general and self-driving cars in particular. The sensors provide complementary information offering an opportunity for tight sensor-fusion. Surprisingly, lidar-only methods outperform fusion methods on the main benchmark datasets, suggesting a gap in the literature. In this work, we propose PointPainting: a sequential fusion method to fill this gap. PointPainting works by projecting lidar points into the output of an image-only semantic segmentation network and appending the class scores to each point. The appended (painted) point cloud can then be fed to any lidar-only method. Experiments show large improvements on three different state-of-the art methods, Point-RCNN, VoxelNet and PointPillars on the KITTI and nuScenes datasets. The painted version of PointRCNN represents a new state of the art on the KITTI leaderboard for the bird's-eye view detection task. In ablation, we study how the effects of Painting depends on the quality and format of the semantic segmentation output, and demonstrate how latency can be minimized through pipelining.
Guo, Zhiling, Shengoku, Hiroaki, Wu, Guangming, Chen, Qi, Yuan, Wei, Shi, Xiaodan, Shao, Xiaowei, Xu, Yongwei, Shibasaki, Ryosuke
The automatic digitizing of paper maps is a significant and challenging task for both academia and industry. As an important procedure of map digitizing, the semantic segmentation section mainly relies on manual visual interpretation with low efficiency. In this study, we select urban planning maps as a representative sample and investigate the feasibility of utilizing U-shape fully convolutional based architecture to perform end-to-end map semantic segmentation. The experimental results obtained from the test area in Shibuya district, Tokyo, demonstrate that our proposed method could achieve a very high Jaccard similarity coefficient of 93.63% and an overall accuracy of 99.36%. For implementation on GPGPU and cuDNN, the required processing time for the whole Shibuya district can be less than three minutes. The results indicate the proposed method can serve as a viable tool for urban planning map semantic segmentation task with high accuracy and efficiency.
Precise and accurate predictions over boundary areas are essential for semantic segmentation. However, the commonly used convolutional operators tend to smooth and blur local detail cues, making it difficult for deep models to generate accurate boundary predictions. In this paper, we introduce an operator-level approach to enhance semantic boundary awareness, so as to improve the prediction of the deep semantic segmentation model. Specifically, we formulate the boundary feature enhancement process as an anisotropic diffusion process. We propose a novel learnable approach called semantic diffusion network (SDN) for approximating the diffusion process, which contains a parameterized semantic difference convolution operator followed by a feature fusion module and constructs a differentiable mapping from original backbone features to advanced boundary-aware features.
The reviews were leaning positive, and after the rebuttal, two of the reviewers recommended to accept, while R3 remained at the "marginally below" score. I agree that most of the reviewers comments were addressed in the rebuttal, and concur with the majority vote here. The paper is an incremental, but worthy addition to the evolving semantic segmentation craft. I strongly encourage the authors to incorporate the responses in the rebuttal into the final version, and in particular, to make sure to include some qualitative results.