CVCP-Fusion: On Implicit Depth Estimation for 3D Bounding Box Prediction
Gupta, Pranav, Rengarajan, Rishabh, Bankapur, Viren, Mannem, Vedansh, Ahuja, Lakshit, Vijay, Surya, Wang, Kevin
–arXiv.org Artificial Intelligence
Combining LiDAR and Camera-view data has become a common approach for 3D Object Detection. However, previous approaches combine the two input streams at a point-level, throwing away semantic information derived from camera features. In this paper we propose Cross-View Center Point-Fusion, a state-of-the-art model to perform 3D object detection by combining camera and LiDAR-derived features in the BEV space to preserve semantic density from the camera stream while incorporating spacial data from the LiDAR stream. Our architecture utilizes aspects from previously established algorithms--Cross-View Transformers and CenterPoint--and runs their backbones in parallel, allowing efficient computation for real-time processing and application. In this paper we find that while an implicitly calculated depth-estimate may be sufficiently accurate in a 2D map-view representation, explicitly calculated geometric and spacial information is needed for precise bounding box prediction in the 3D world-view space. Code to reproduce our results is available at https: // github.
arXiv.org Artificial Intelligence
Oct-15-2024
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Energy (0.37)
- Technology: