Enhancing Traffic Incident Response through Sub-Second Temporal Localization with HybridMamba
Shihab, Ibne Farabi, Akter, Sanjeda, Sharma, Anuj
–arXiv.org Artificial Intelligence
Traffic crash detection in long-form surveillance videos is essential for improving emergency response and infrastructure planning, yet remains difficult due to the brief and infrequent nature of crash events. We present \textbf{HybridMamba}, a novel architecture integrating visual transformers with state-space temporal modeling to achieve high-precision crash time localization. Our approach introduces multi-level token compression and hierarchical temporal processing to maintain computational efficiency without sacrificing temporal resolution. Evaluated on a large-scale dataset from the Iowa Department of Transportation, HybridMamba achieves a mean absolute error of \textbf{1.50 seconds} for 2-minute videos ($p<0.01$ compared to baselines), with \textbf{65.2%} of predictions falling within one second of the ground truth. It outperforms recent video-language models (e.g., TimeChat, VideoLLaMA-2) by up to 3.95 seconds while using significantly fewer parameters (3B vs. 13--72B). Our results demonstrate effective temporal localization across various video durations (2--40 minutes) and diverse environmental conditions, highlighting HybridMamba's potential for fine-grained temporal localization in traffic surveillance while identifying challenges that remain for extended deployment.
arXiv.org Artificial Intelligence
Sep-16-2025
- Country:
- North America > United States
- District of Columbia > Washington (0.04)
- Iowa > Story County
- Ames (0.04)
- Texas (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government > Regional Government (0.66)
- Transportation > Ground
- Road (0.93)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (0.89)
- Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence