Two-Stream Convolutional Networks for Action Recognition in Videos

Karen Simonyan, Andrew Zisserman

Neural Information Processing Systems 

We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework.