Reviews: Spatiotemporal Residual Networks for Video Action Recognition
–Neural Information Processing Systems
This paper presents a framework that improves two stream networks for video action recognition by extending residual network to combine information from two streams into one single network. It significantly improves over previous state-of-the-art on two popular video action recognition benchmark. The downside of this paper is the limited novelty. There are previous work tried to combine two streams into a single network [1,2], and the temporal convolution is not new either [3]. Although the way to combine two streams is slightly different from previous work, the proposed approach is still pretty straightforward.
Neural Information Processing Systems
Jan-20-2025, 09:59:34 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)