waymo open dataset
a64e641fa00a7eb9500cb7e1835d0495-Supplemental-Conference.pdf
Table A1: 3D semantic segmentation results on the SemanticKiTTI validation set. We implemented our method with Pytorch using the open-source OpenPCDet [1]. The faded strategy was used during the last 5 epochs. It provides 22 sequences with 19 semantic classes, captured by a 64-beam LiDAR sensor. The 4th and 5th models sequentially incorporate our proposed SED blocks and DED blocks. Center-based 3d object detection and tracking.
Fully Sparse 3D Object Detection
As the perception range of LiDAR increases, LiDAR-based 3D object detection becomes a dominant task in the long-range perception task of autonomous driving. The mainstream 3D object detectors usually build dense feature maps in the network backbone and prediction head. However, the computational and spatial costs on the dense feature map are quadratic to the perception range, which makes them hardly scale up to the long-range setting. To enable efficient long-range LiDAR-based object detection, we build a fully sparse 3D object detector (FSD). The computational and spatial cost of FSD is roughly linear to the number of points and independent of the perception range.
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > California (0.04)
- North America > United States > Arizona (0.04)
A Appendix
The majority of the appendix is devoted to a faithful listing of hyperparameters, datasets, and training settings for all of our experiments. A.1 Addendum on Proposition 3 and Choice of Activation Function We begin with a proof of Proposition 3, which we rewrite here for convenience: Proposition 3: We then calculate f ( P)= k ( P 0 . The third claim is proven. Although Proposition 3 was proven for a specific family of activation functions (i.e. Then the following must be true: 1. E [ L We note that the variance claims in both Proposition 3 and Corollary 3.1 are relatively simple extensions of the intuitive result that the variance of a random variable that can take on only two values is maximized when the two values each have a probability weight of As described in Section 3.2, it is necessary to develop a version of GradDrop that operates nontrivially The issue we need to resolve is that these gradients are dependent on their batch's input values, so just summing gradients across the batch dimension is not an option.
Data-Efficient Point Cloud Semantic Segmentation Pipeline for Unimproved Roads
Yarovoi, Andrew, Valenta, Christopher R.
--In this case study, we present a data-efficient point cloud segmentation pipeline and training framework for robust segmentation of unimproved roads and seven other classes. Our method employs a two-stage training framework: first, a projection-based convolutional neural network is pre-trained on a mixture of public urban datasets and a small, curated in-domain dataset; then, a lightweight prediction head is fine-tuned exclusively on in-domain data. Along the way, we explore the application of Point Prompt Training to batch normalization layers and the effects of Manifold Mixup as a regularizer within our pipeline. We also explore the effects of incorporating histogram-normalized ambients to further boost performance. Using only 50 labeled point clouds from our target domain, we show that our proposed training approach improves mean Intersection-over-Union from 33.5% to 51.8% and the overall accuracy from 85.5% to 90.8%, when compared to naive training on the in-domain data. Crucially, our results demonstrate that pre-training across multiple datasets is key to improving generalization and enabling robust segmentation under limited in-domain supervision. Overall, this study demonstrates a practical framework for robust 3D semantic segmentation in challenging, low-data scenarios. Semantic segmentation of 3D point clouds is a foundational task for scene understanding, enabling a range of downstream applications such as autonomous route planning and infrastructure inspection. Despite significant progress in this field, most state-of-the-art segmentation models rely heavily on the availability of large, labeled training datasets. However, generating labeled point cloud data remains a substantial bottleneck: manual annotation is both labor-intensive and time-consuming, requiring over 30 minutes per scan on average in our experiments. This challenge makes it impractical to recreate large-scale datasets, commonly containing over 25,000 scans, for new or underrepresented environments.
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > California (0.04)
- North America > United States > Arizona (0.04)