Supplementary Material for Self-supervised Co-Training for Video Representation Learning

Neural Information Processing Systems 

We use the S3D architecture for all experiments. CoCLR), S3D is followed by a non-linear projection head. The projection head is removed when evaluating downstream tasks. The detailed dimensions are shown in Table 1.Stage Detail Output size: T HW C S3D followed by average pooling 1 1 When evaluating the pretrained representation for action classification, we replace the non-linear projection head with a single linear layer for the classification tasks. The history queue is used in all pretraining experiments (including both InfoNCE and CoCLR).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found