Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking
Wang, Shiao, Huang, Ju, Ma, Qingchuan, Gao, Jinfeng, Xu, Chunyi, Wang, Xiao, Chen, Lan, Jiang, Bo
–arXiv.org Artificial Intelligence
--Combining traditional RGB cameras with bio-inspired event cameras for robust object tracking has garnered increasing attention in recent years. However, most existing mul-timodal tracking algorithms depend heavily on high-complexity Vision Transformer architectures for feature extraction and fusion across modalities. This not only leads to substantial computational overhead but also limits the effectiveness of cross-modal interactions. In this paper, we propose an efficient RGB-Event object tracking framework based on the linear-complexity Vision Mamba network, termed Mamba-FETrack V2. Specifically, we first design a lightweight Prompt Generator that utilizes embedded features from each modality, together with a shared prompt pool, to dynamically generate modality-specific learnable prompt vectors. These prompts, along with the modality-specific embedded features, are then fed into a Vision Mamba-based FEMamba backbone, which facilitates prompt-guided feature extraction, cross-modal interaction, and fusion in a unified manner . Finally, the fused representations are passed to the tracking head for accurate target localization. Extensive experimental evaluations on multiple RGB-Event tracking benchmarks, including short-term COESOT dataset and long-term datasets, i.e., FE108 and FEL T V2, demonstrate the superior performance and efficiency of the proposed tracking framework. ISUAL Object Tracking (VOT) is a crucial task in the field of computer vision, aiming to locate a given target in subsequent video frames, given its initial position in the first frame. This task demonstrates significant practical value, covering a wide range of important fields such as security surveillance, autonomous driving perception, sports analytics, and human-computer interaction. Currently, most visual object tracking algorithms [1]-[4] are designed and developed based on RGB cameras.
arXiv.org Artificial Intelligence
Jul-1-2025
- Country:
- Asia > China > Anhui Province > Hefei (0.04)
- Genre:
- Research Report (0.82)
- Industry:
- Information Technology (0.34)
- Technology: