Viewpoint Equivariance for Multi-View 3D Object Detection

Chen, Dian, Li, Jie, Guizilini, Vitor, Ambrus, Rares, Gaidon, Adrien

Apr-7-2023–arXiv.org Artificial Intelligence

3D object detection from visual sensors is a cornerstone capability of robotic systems. State-of-the-art methods focus on reasoning and decoding object bounding boxes from multi-view camera input. In this work we gain intuition from the integral role of multi-view consistency in 3D scene understanding and geometric learning. To this end, we introduce VEDet, a novel 3D object detection framework that exploits 3D multi-view geometry to improve localization through viewpoint awareness and equivariance. VEDet leverages a query-based transformer architecture and encodes the 3D scene by augmenting image features with positional encodings from their 3D perspective geometry. We design view-conditioned queries at the output level, which enables the generation of multiple virtual frames during training to learn viewpoint equivariance by enforcing multi-view consistency. The multi-view geometry injected at the input level as positional encodings and regularized at the loss level provides rich geometric cues for 3D object detection, leading to state-of-the-art performance on the nuScenes benchmark. The code and model are made available at https://github.com/TRI-ML/VEDet.

artificial intelligence, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Apr-7-2023

arXiv.org PDF

Add feedback

Country:
- South America > Brazil (0.04)
- North America > United States
  - California > Santa Clara County > Los Altos (0.04)

Genre:
- Research Report (0.70)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found