Pre-Training LiDAR-Based 3D Object Detectors Through Colorization
Pan, Tai-Yu, Ma, Chenyang, Chen, Tianle, Phoo, Cheng Perng, Luo, Katie Z, You, Yurong, Campbell, Mark, Weinberger, Kilian Q., Hariharan, Bharath, Chao, Wei-Lun
–arXiv.org Artificial Intelligence
Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equipping it with valuable semantic cues. To tackle challenges arising from color variations and selection bias, we incorporate color as "context" by providing ground-truth colors as hints during colorization. Even with limited labeled data, GPC significantly improves finetuning performance; notably, on just 20% of the KITTI dataset, GPC outperforms training from scratch with the entire dataset. In sum, we introduce a fresh perspective on pre-training for 3D object detection, aligning the objective with the model's intended role and ultimately advancing the accuracy and efficiency of 3D object detection for autonomous vehicles. Detecting objects such as vehicles and pedestrians in 3D is crucial for self-driving cars to operate safely. Mainstream 3D object detectors (Shi et al., 2019; 2020b; Zhu et al., 2020; He et al., 2020a) take LiDAR point clouds as input, which provide precise 3D signals of the surrounding environment. However, training a detector needs a lot of labeled data. The expensive process of curating annotated data has motivated the community to investigate model pre-training using unlabeled data that can be collected easily. Most of the existing pre-training methods are built upon contrastive learning (Yin et al., 2022; Xie et al., 2020; Zhang et al., 2021; Huang et al., 2021; Liang et al., 2021), inspired by its success in 2D recognition (Chen et al., 2020a; He et al., 2020b). The key novelties, however, are often limited to how the positive and negative data pairs are constructed. This paper attempts to go beyond contrastive learning by providing a new perspective on pre-training 3D object detectors. We rethink pre-training's role in how it could facilitate the downstream fine-tuning with labeled data.
arXiv.org Artificial Intelligence
Oct-23-2023
- Country:
- North America > United States > Ohio (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology > Robotics & Automation (0.54)
- Transportation > Ground
- Road (0.54)
- Technology: