A Appendix Implementation details of our method All experiments are implemented in PyTorch [ 10
–Neural Information Processing Systems
Partial Fine-tuning: we finetune only the last Transformer block and the classifier layer. Experiments with other foundation models It is observed that with the same model, CLIP pre-training is superior to ImageNet21K pre-training (not surprising due to the training data scale and richness difference). Inference Speed We provide an inference speed test in Table 4. All other training settings are identical to that used for Kinetics-400. We compare with methods that take only RGB frames as input (without optical flow).
Neural Information Processing Systems
Aug-17-2025, 12:06:40 GMT
- Technology: