A Implementation Details

Neural Information Processing Systems 

Dataset Sampling We sample 1% (10%) images per class from the datasets we use for the semi-supervised learning experiments of 1% (10%) labels. The final Semi-ViT -Small is with DINO self-pretraining. Supervised Fine-tuning Settings The settings for the stage of supervised fine-tuning, with and without self-pretraining, are shown in Table 12. Computing Resources We run all experiments on V100 GPUs of 32G memory. Random Seeds and Error Bar Since some of the experiments are expensive to run, e.g., Semi-ViT -Huge.