3D point cloud segmentation is an important function that helps robots understand the layout of their surrounding environment and perform tasks such as grasping objects, avoiding obstacles, and finding landmarks. Current segmentation methods are mostly class-specific, many of which are tuned to work with specific object categories and may not be generalizable to different types of scenes. This research proposes a learnable region growing method for class-agnostic point cloud segmentation, specifically for the task of instance label prediction. The proposed method is able to segment any class of objects using a single deep neural network without any assumptions about their shapes and sizes. The deep neural network is trained to predict how to add or remove points from a point cloud region to morph it into incrementally more complete regions of an object instance. Segmentation results on the S3DIS and ScanNet datasets show that the proposed method outperforms competing methods by 1%-9% on 6 different evaluation metrics.
Data can take on a variety of forms. For processing visual information, images are extremely common. Images store a two-dimensional grid of pixels that often represent our three-dimensional world. Some of the most successful advances in machine learning have come from problems involving images. However, for capturing data in 3D directly, it is less common to have a three-dimensional array of pixels representing a full volume.
In this paper, we propose one novel model for point cloud semantic segmentation,which exploits both the local and global structures within the point cloud based onthe contextual point representations. Specifically, we enrich each point represen-tation by performing one novel gated fusion on the point itself and its contextualpoints. Afterwards, based on the enriched representation, we propose one novelgraph pointnet module, relying on the graph attention block to dynamically com-pose and update each point representation within the local point cloud structure.Finally, we resort to the spatial-wise and channel-wise attention strategies to exploitthe point cloud global structure and thereby yield the resulting semantic label foreach point. Extensive results on the public point cloud databases, namely theS3DIS and ScanNet datasets, demonstrate the effectiveness of our proposed model,outperforming the state-of-the-art approaches. Our code for this paper is available at https://github.com/fly519/ELGS.
Scene understanding based on LiDAR point cloud is an essential task for autonomous cars to drive safely, which often employs spherical projection to map 3D point cloud into multi-channel 2D images for semantic segmentation. Most existing methods simply stack different point attributes/modalities (e.g. coordinates, intensity, depth, etc.) as image channels to increase information capacity, but ignore distinct characteristics of point attributes in different image channels. We design FPS-Net, a convolutional fusion network that exploits the uniqueness and discrepancy among the projected image channels for optimal point cloud segmentation. FPS-Net adopts an encoder-decoder structure. Instead of simply stacking multiple channel images as a single input, we group them into different modalities to first learn modality-specific features separately and then map the learned features into a common high-dimensional feature space for pixel-level fusion and learning. Specifically, we design a residual dense block with multiple receptive fields as a building block in the encoder which preserves detailed information in each modality and learns hierarchical modality-specific and fused features effectively. In the FPS-Net decoder, we use a recurrent convolution block likewise to hierarchically decode fused features into output space for pixel-level classification. Extensive experiments conducted on two widely adopted point cloud datasets show that FPS-Net achieves superior semantic segmentation as compared with state-of-the-art projection-based methods. In addition, the proposed modality fusion idea is compatible with typical projection-based methods and can be incorporated into them with consistent performance improvements.
LatticeNet: Fast Point Cloud Segmentation Using Permutohedral Lattices Radu Alexandru Rosu Peer Sch utt Jan Quenzel Sven Behnke Abstract -- Deep convolutional neural networks (CNNs) have shown outstanding performance in the task of semantically segmenting images. However, applying the same methods on 3D data still poses challenges due to the heavy memory requirements and the lack of structured data. Here, we propose LatticeNet, a novel approach for 3D semantic segmentation, which takes as input raw point clouds. A PointNet describes the local geometry which we embed into a sparse permutohedral lattice. The lattice allows for fast convolutions while keeping a low memory footprint. Further, we introduce DeformSlice, a novel learned data-dependent interpolation for projecting lattice features back onto the point cloud. We present results of 3D segmentation on various datasets where our method achieves state-of-the-art performance. I NTRODUCTION Environment understanding is a crucial ability for autonomous agents.