Deep Alternative Neural Network: Exploring Contexts as Early as Possible for Action Recognition

Jinzhuo Wang, Wenmin Wang, xiongtao Chen, Ronggang Wang, Wen Gao

Neural Information Processing Systems 

Contexts are crucial for action recognition in video. Current methods often mine contexts after extracting hierarchical local features and focus on their high-order encodings. This paper instead explores contexts as early as possible and leverages their evolutions for action recognition. In particular, we introduce a novel architecture called deep alternative neural network (DANN) stacking alternative layers. Each alternative layer consists of a volumetric convolutional layer followed by a recurrent layer.