Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking

Wang, Shiao, Huang, Ju, Ma, Qingchuan, Gao, Jinfeng, Xu, Chunyi, Wang, Xiao, Chen, Lan, Jiang, Bo

Jul-1-2025–arXiv.org Artificial Intelligence

--Combining traditional RGB cameras with bio-inspired event cameras for robust object tracking has garnered increasing attention in recent years. However, most existing mul-timodal tracking algorithms depend heavily on high-complexity Vision Transformer architectures for feature extraction and fusion across modalities. This not only leads to substantial computational overhead but also limits the effectiveness of cross-modal interactions. In this paper, we propose an efficient RGB-Event object tracking framework based on the linear-complexity Vision Mamba network, termed Mamba-FETrack V2. Specifically, we first design a lightweight Prompt Generator that utilizes embedded features from each modality, together with a shared prompt pool, to dynamically generate modality-specific learnable prompt vectors. These prompts, along with the modality-specific embedded features, are then fed into a Vision Mamba-based FEMamba backbone, which facilitates prompt-guided feature extraction, cross-modal interaction, and fusion in a unified manner . Finally, the fused representations are passed to the tracking head for accurate target localization. Extensive experimental evaluations on multiple RGB-Event tracking benchmarks, including short-term COESOT dataset and long-term datasets, i.e., FE108 and FEL T V2, demonstrate the superior performance and efficiency of the proposed tracking framework. ISUAL Object Tracking (VOT) is a crucial task in the field of computer vision, aiming to locate a given target in subsequent video frames, given its initial position in the first frame. This task demonstrates significant practical value, covering a wide range of important fields such as security surveillance, autonomous driving perception, sports analytics, and human-computer interaction. Currently, most visual object tracking algorithms [1]-[4] are designed and developed based on RGB cameras.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jul-1-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China > Anhui Province > Hefei (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found