3D point cloud segmentation is an important function that helps robots understand the layout of their surrounding environment and perform tasks such as grasping objects, avoiding obstacles, and finding landmarks. Current segmentation methods are mostly class-specific, many of which are tuned to work with specific object categories and may not be generalizable to different types of scenes. This research proposes a learnable region growing method for class-agnostic point cloud segmentation, specifically for the task of instance label prediction. The proposed method is able to segment any class of objects using a single deep neural network without any assumptions about their shapes and sizes. The deep neural network is trained to predict how to add or remove points from a point cloud region to morph it into incrementally more complete regions of an object instance. Segmentation results on the S3DIS and ScanNet datasets show that the proposed method outperforms competing methods by 1%-9% on 6 different evaluation metrics.
In this paper, we propose one novel model for point cloud semantic segmentation,which exploits both the local and global structures within the point cloud based onthe contextual point representations. Specifically, we enrich each point represen-tation by performing one novel gated fusion on the point itself and its contextualpoints. Afterwards, based on the enriched representation, we propose one novelgraph pointnet module, relying on the graph attention block to dynamically com-pose and update each point representation within the local point cloud structure.Finally, we resort to the spatial-wise and channel-wise attention strategies to exploitthe point cloud global structure and thereby yield the resulting semantic label foreach point. Extensive results on the public point cloud databases, namely theS3DIS and ScanNet datasets, demonstrate the effectiveness of our proposed model,outperforming the state-of-the-art approaches. Our code for this paper is available at https://github.com/fly519/ELGS.
In multimodal traffic monitoring, we gather traffic statistics for distinct transportation modes, such as pedestrians, cars and bicycles, in order to analyze and improve people's daily mobility in terms of safety and convenience. On account of its robustness to bad light and adverse weather conditions, and inherent speed measurement ability, the radar sensor is a suitable option for this application. However, the sparse radar data from conventional commercial radars make it extremely challenging for transportation mode classification. Thus, we propose to use a high-resolution millimeter-wave(mmWave) radar sensor to obtain a relatively richer radar point cloud representation for a traffic monitoring scenario. Based on a new feature vector, we use the multivariate Gaussian mixture model (GMM) to do the radar point cloud segmentation, i.e. `point-wise' classification, in an unsupervised learning environment. In our experiment, we collected radar point clouds for pedestrians and cars, which also contained the inevitable clutter from the surroundings. The experimental results using GMM on the new feature vector demonstrated a good segmentation performance in terms of the intersection-over-union (IoU) metrics. The detailed methodology and validation metrics are presented and discussed.
LatticeNet: Fast Point Cloud Segmentation Using Permutohedral Lattices Radu Alexandru Rosu Peer Sch utt Jan Quenzel Sven Behnke Abstract -- Deep convolutional neural networks (CNNs) have shown outstanding performance in the task of semantically segmenting images. However, applying the same methods on 3D data still poses challenges due to the heavy memory requirements and the lack of structured data. Here, we propose LatticeNet, a novel approach for 3D semantic segmentation, which takes as input raw point clouds. A PointNet describes the local geometry which we embed into a sparse permutohedral lattice. The lattice allows for fast convolutions while keeping a low memory footprint. Further, we introduce DeformSlice, a novel learned data-dependent interpolation for projecting lattice features back onto the point cloud. We present results of 3D segmentation on various datasets where our method achieves state-of-the-art performance. I NTRODUCTION Environment understanding is a crucial ability for autonomous agents.
Scene understanding based on LiDAR point cloud is an essential task for autonomous cars to drive safely, which often employs spherical projection to map 3D point cloud into multi-channel 2D images for semantic segmentation. Most existing methods simply stack different point attributes/modalities (e.g. coordinates, intensity, depth, etc.) as image channels to increase information capacity, but ignore distinct characteristics of point attributes in different image channels. We design FPS-Net, a convolutional fusion network that exploits the uniqueness and discrepancy among the projected image channels for optimal point cloud segmentation. FPS-Net adopts an encoder-decoder structure. Instead of simply stacking multiple channel images as a single input, we group them into different modalities to first learn modality-specific features separately and then map the learned features into a common high-dimensional feature space for pixel-level fusion and learning. Specifically, we design a residual dense block with multiple receptive fields as a building block in the encoder which preserves detailed information in each modality and learns hierarchical modality-specific and fused features effectively. In the FPS-Net decoder, we use a recurrent convolution block likewise to hierarchically decode fused features into output space for pixel-level classification. Extensive experiments conducted on two widely adopted point cloud datasets show that FPS-Net achieves superior semantic segmentation as compared with state-of-the-art projection-based methods. In addition, the proposed modality fusion idea is compatible with typical projection-based methods and can be incorporated into them with consistent performance improvements.