Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking

Wang, Shiao, Huang, Ju, Ma, Qingchuan, Gao, Jinfeng, Xu, Chunyi, Wang, Xiao, Chen, Lan, Jiang, Bo

arXiv.org Artificial Intelligence 

--Combining traditional RGB cameras with bio-inspired event cameras for robust object tracking has garnered increasing attention in recent years. However, most existing mul-timodal tracking algorithms depend heavily on high-complexity Vision Transformer architectures for feature extraction and fusion across modalities. This not only leads to substantial computational overhead but also limits the effectiveness of cross-modal interactions. In this paper, we propose an efficient RGB-Event object tracking framework based on the linear-complexity Vision Mamba network, termed Mamba-FETrack V2. Specifically, we first design a lightweight Prompt Generator that utilizes embedded features from each modality, together with a shared prompt pool, to dynamically generate modality-specific learnable prompt vectors. These prompts, along with the modality-specific embedded features, are then fed into a Vision Mamba-based FEMamba backbone, which facilitates prompt-guided feature extraction, cross-modal interaction, and fusion in a unified manner . Finally, the fused representations are passed to the tracking head for accurate target localization. Extensive experimental evaluations on multiple RGB-Event tracking benchmarks, including short-term COESOT dataset and long-term datasets, i.e., FE108 and FEL T V2, demonstrate the superior performance and efficiency of the proposed tracking framework. ISUAL Object Tracking (VOT) is a crucial task in the field of computer vision, aiming to locate a given target in subsequent video frames, given its initial position in the first frame. This task demonstrates significant practical value, covering a wide range of important fields such as security surveillance, autonomous driving perception, sports analytics, and human-computer interaction. Currently, most visual object tracking algorithms [1]-[4] are designed and developed based on RGB cameras.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found