Supplementary Material for Self-supervised Co-Training for Video Representation Learning
–Neural Information Processing Systems
We use the S3D architecture for all experiments. CoCLR), S3D is followed by a non-linear projection head. The projection head is removed when evaluating downstream tasks. The detailed dimensions are shown in Table 1.Stage Detail Output size: T HW C S3D followed by average pooling 1 1 When evaluating the pretrained representation for action classification, we replace the non-linear projection head with a single linear layer for the classification tasks. The history queue is used in all pretraining experiments (including both InfoNCE and CoCLR).
Neural Information Processing Systems
Oct-2-2025, 17:52:45 GMT
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.05)
- North America > Canada (0.05)
- Europe > United Kingdom
- Technology: