Supplementary Material for Self-supervised Co-Training for Video Representation Learning

Oct-2-2025, 17:52:45 GMT–Neural Information Processing Systems

We use the S3D architecture for all experiments. CoCLR), S3D is followed by a non-linear projection head. The projection head is removed when evaluating downstream tasks. The detailed dimensions are shown in Table 1.Stage Detail Output size: T HW C S3D followed by average pooling 1 1 When evaluating the pretrained representation for action classification, we replace the non-linear projection head with a single linear layer for the classification tasks. The history queue is used in all pretraining experiments (including both InfoNCE and CoCLR).

artificial intelligence, machine learning, torch, (14 more...)

Neural Information Processing Systems

Oct-2-2025, 17:52:45 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
Self-supervisedCo-TrainingforVideoRepresentationLearning

Similar Docs Excel Report more

Title	Similarity	Source
None found