VideoCapsuleNet: A Simplified Network for Action Detection
Kevin Duarte, Yogesh Rawat, Mubarak Shah
–Neural Information Processing Systems
Wepropose a 3D capsule network for videos, called VideoCapsuleNet: a unified network for action detection which can jointly perform pixel-wise action segmentation along with action classification. The proposed network is a generalization of capsule network from 2D to 3D, which takes a sequence of video frames as input. The 3D generalization drastically increases the number of capsules in the network, making capsule routing computationally expensive.
Neural Information Processing Systems
Feb-13-2026, 06:17:56 GMT