Deep Attentive Tracking via Reciprocative Learning

Pu, Shi, Song, Yibing, Ma, Chao, Zhang, Honggang, Yang, Ming-Hsuan

Neural Information Processing Systems 

Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data. Recently, significant efforts have been made to exploit attention schemes to advance computer vision systems. For visual tracking, it is often challenging to track target objects undergoing large appearance changes. Attention maps facilitate visual tracking by selectively paying attention to temporal robust features. Existing tracking-by-detection approaches mainly use additional attention modules to generate feature weights as the classifiers are not equipped with such mechanisms.