EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Wang, Zhe, Fan, Siqi, Huo, Xiaoliang, Xu, Tongda, Wang, Yan, Liu, Jingjing, Chen, Yilun, Zhang, Ya-Qin

Feb-23-2024–arXiv.org Artificial Intelligence

In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: $1)$ inherent pose errors when fusing multi-view images, caused by time asynchrony across cameras; $2)$ information loss in transmission process resulted from limited communication bandwidth. To address these issues, we propose a novel camera-based 3D detection framework for VIC3D task, Enhanced Multi-scale Image Feature Fusion (EMIFF). To fully exploit holistic perspectives from both vehicles and infrastructure, we propose Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM) modules to enhance infrastructure and vehicle features at scale, spatial, and channel levels to correct the pose error introduced by camera asynchrony. We also introduce a Feature Compression (FC) module with channel and spatial compression blocks for transmission efficiency. Experiments show that EMIFF achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous early-fusion and late-fusion methods with comparable transmission costs.

arxiv preprint arxiv, detection, module, (12 more...)

arXiv.org Artificial Intelligence

Feb-23-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report (0.50)

Industry:
- Automobiles & Trucks (0.35)
- Information Technology (0.35)
- Transportation > Ground
  - Road (0.35)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Machine Learning (1.00)
    - Representation & Reasoning > Information Fusion (0.49)
    - Robots > Autonomous Vehicles (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found