A Implementation Details
–Neural Information Processing Systems
For fine-tuning, we add a linear classifier layer to the encoder's averaged tokens [18]. In Table 3 we have compared with ImageNet-based supervised/MAE pre-training. The "extra data" column specifies the data used in addition to K400. This table does not include results using K700, because the K700 training set has 13.9k videos duplicated with Results with K700 are in T able 8 (A VA) and T able 9 (SSv2). Table 7 compares on Kinetics-400 (K400).
Neural Information Processing Systems
Aug-19-2025, 15:57:24 GMT
- Technology: