LoRATv2: Enabling Low-Cost Temporal Modeling in One-Stream Trackers
–Neural Information Processing Systems
Transformer-based algorithms, such as LoRAT, have significantly enhanced objecttracking performance. However, these approaches rely on a standard attention mechanism, which incurs quadratic token complexity, making real-time inference computationally expensive. In this paper, we introduce LoRATv2, a novel tracking framework that addresses these limitations with three main contributions. First, LoRATv2 integrates frame-wise causal attention, which ensures full selfattention within each frame while enabling causal dependencies across frames, significantly reducing computational overhead. Moreover, key-value (KV) caching is employed to efficiently reuse past embeddings for further speedup.
Neural Information Processing Systems
Jun-21-2026, 18:25:34 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language (1.00)
- Vision (0.93)
- Machine Learning > Neural Networks (0.88)
- Information Technology > Artificial Intelligence