MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations
Ke, Hongyu, Morris, Jack, Oguchi, Kentaro, Cao, Xiaofei, Liu, Yongkang, Wang, Haoxin, Ding, Yi
–arXiv.org Artificial Intelligence
However, designing computationally efficient methods remains a significant challenge. In this paper, we propose a Mamba-based framework called MamBEV, which learns unified Bird's Eye View (BEV) representations using linear spatio-temporal SSMbased attention. This approach supports multiple 3D perception tasks with significantly improved computational and memory efficiency. Furthermore, we introduce SSM based cross-attention, analogous to standard cross attention, where BEV query representations can interact with relevant image features. Extensive experiments demonstrate MamBEV's promising performance across diverse visual perception metrics, highlighting its advantages in input scaling efficiency compared to existing benchmark models. The code is available at https://github.com/amaigsu/MamBEV. Automatically constructing a bird's-eye-view (BEV) of an object's surrounding environment is beneficial for tasks such as autonomous driving and driver assistance systems (Wang et al., 2023a). These methods typically integrate the signals received by multi-view cameras and transforms them into a top-down view of the surrounding environment. Furthermore, as these systems operate in an mobile edge environment, it is important to consider the computational costs in conjunction with construction accuracy (Ke et al., 2024). Examples of deployed BEV systems can be seen in Tesla cars (Tesla, 2021). These detailed constructions can be used for downstream perceptual, prediction, and planning tasks (Casas et al., 2021; Hu et al., 2023).
arXiv.org Artificial Intelligence
Mar-17-2025
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Automobiles & Trucks > Manufacturer (1.00)
- Transportation > Ground
- Road (0.88)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.46)
- Natural Language (0.88)
- Representation & Reasoning (0.83)
- Robots (0.66)
- Vision (0.95)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence