Spatiotemporal Residual Networks for Video Action Recognition

Feichtenhofer, Christoph, Pinz, Axel, Wildes, Richard

Feb-14-2020, 13:58:05 GMT–Neural Information Processing Systems

Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, we introduce spatiotemporal ResNets as a combination of these two approaches. First, we inject residual connections between the appearance and motion pathways of a two-stream architecture to allow spatiotemporal interaction between the two streams. Second, we transform pretrained image ConvNets into spatiotemporal networks by equipping these with learnable convolutional filters that are initialized as temporal residual connections and operate on adjacent feature maps in time.

resnet, spatiotemporal residual network, video action recognition, (4 more...)

Neural Information Processing Systems

Feb-14-2020, 13:58:05 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.70)
  - Machine Learning > Neural Networks
    - Deep Learning (0.44)