A Implementation Details

Neural Information Processing Systems 

For fine-tuning, we add a linear classifier layer to the encoder's averaged tokens [18]. In Table 3 we have compared with ImageNet-based supervised/MAE pre-training. The "extra data" column specifies the data used in addition to K400. This table does not include results using K700, because the K700 training set has 13.9k videos duplicated with Results with K700 are in T able 8 (A VA) and T able 9 (SSv2). Table 7 compares on Kinetics-400 (K400).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found