capsule
Appendix for Unsupervised Motion Representation Learning with Capsule Autoencoders
We show in the table below the notations grouped by the modules. The values used in our implementation are shown if applicable. The necessity of a two-layer hierarchy is briefly discussed in Section 3.3. In short, it is difficult for a single-layer hierarchy to capture long-time dependencies and variations. This section describes an empirical study where we compare MCAE with its single-layer correspondence.
VideoCapsuleNet: A Simplified Network for Action Detection
The recent advances in Deep Convolutional Neural Networks (DCNNs) have shown extremely good results for video human action classification, however, action detection is still a challenging problem. The current action detection approaches follow a complex pipeline which involves multiple tasks such as tube proposals, optical flow, and tube classification. In this work, we present a more elegant solution for action detection based on the recently developed capsule network. We propose a 3D capsule network for videos, called VideoCapsuleNet: a unified network for action detection which can jointly perform pixel-wise action segmentation along with action classification. The proposed network is a generalization of capsule network from 2D to 3D, which takes a sequence of video frames as input.
Self-Routing Capsule Networks
Taeyoung Hahn, Myeongjang Pyeon, Gunhee Kim
In this work, we propose a novel and surprisingly simple routing strategy called self-routing, where each capsule is routed independently by its subordinate routing network. Therefore, the agreement between capsules is not required anymore, but both poses and activations of upper-level capsules are obtained in a way similar to Mixture-of-Experts. Our experiments on CIFAR10, SVHN, and SmallNORB showthat the self-routing performs more robustly against white-box adversarial attacks and affine transformations, requiring less computation.
VideoCapsuleNet: A Simplified Network for Action Detection
Kevin Duarte, Yogesh Rawat, Mubarak Shah
Wepropose a 3D capsule network for videos, called VideoCapsuleNet: a unified network for action detection which can jointly perform pixel-wise action segmentation along with action classification. The proposed network is a generalization of capsule network from 2D to 3D, which takes a sequence of video frames as input. The 3D generalization drastically increases the number of capsules in the network, making capsule routing computationally expensive.