CVCP-Fusion: On Implicit Depth Estimation for 3D Bounding Box Prediction

Gupta, Pranav, Rengarajan, Rishabh, Bankapur, Viren, Mannem, Vedansh, Ahuja, Lakshit, Vijay, Surya, Wang, Kevin

Oct-15-2024–arXiv.org Artificial Intelligence

Combining LiDAR and Camera-view data has become a common approach for 3D Object Detection. However, previous approaches combine the two input streams at a point-level, throwing away semantic information derived from camera features. In this paper we propose Cross-View Center Point-Fusion, a state-of-the-art model to perform 3D object detection by combining camera and LiDAR-derived features in the BEV space to preserve semantic density from the camera stream while incorporating spacial data from the LiDAR stream. Our architecture utilizes aspects from previously established algorithms--Cross-View Transformers and CenterPoint--and runs their backbones in parallel, allowing efficient computation for real-time processing and application. In this paper we find that while an implicitly calculated depth-estimate may be sufficiently accurate in a 2D map-view representation, explicitly calculated geometric and spacial information is needed for precise bounding box prediction in the 3D world-view space. Code to reproduce our results is available at https: // github.

artificial intelligence, detection, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-15-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Energy (0.37)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (0.35)
  - Representation & Reasoning (1.00)
  - Vision > Image Understanding (0.42)