MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations

Ke, Hongyu, Morris, Jack, Oguchi, Kentaro, Cao, Xiaofei, Liu, Yongkang, Wang, Haoxin, Ding, Yi

Mar-17-2025–arXiv.org Artificial Intelligence

However, designing computationally efficient methods remains a significant challenge. In this paper, we propose a Mamba-based framework called MamBEV, which learns unified Bird's Eye View (BEV) representations using linear spatio-temporal SSMbased attention. This approach supports multiple 3D perception tasks with significantly improved computational and memory efficiency. Furthermore, we introduce SSM based cross-attention, analogous to standard cross attention, where BEV query representations can interact with relevant image features. Extensive experiments demonstrate MamBEV's promising performance across diverse visual perception metrics, highlighting its advantages in input scaling efficiency compared to existing benchmark models. The code is available at https://github.com/amaigsu/MamBEV. Automatically constructing a bird's-eye-view (BEV) of an object's surrounding environment is beneficial for tasks such as autonomous driving and driver assistance systems (Wang et al., 2023a). These methods typically integrate the signals received by multi-view cameras and transforms them into a top-down view of the surrounding environment. Furthermore, as these systems operate in an mobile edge environment, it is important to consider the computational costs in conjunction with construction accuracy (Ke et al., 2024). Examples of deployed BEV systems can be seen in Tesla cars (Tesla, 2021). These detailed constructions can be used for downstream perceptual, prediction, and planning tasks (Casas et al., 2021; Hu et al., 2023).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Mar-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Automobiles & Trucks > Manufacturer (1.00)
- Transportation > Ground
  - Road (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language (0.88)
  - Representation & Reasoning (0.83)
  - Robots (0.66)
  - Vision (0.95)