971f1e59cd956cc094da4e2f78c6ea7c-Supplemental-Conference.pdf

Neural Information Processing Systems 

We use the linear lr scaling rule: lr = base_lr bsz/256. For BYOL [6], we did not follow the hyperparameters (blr = 1.0e 4, wd = 0.03) in [4], as we The hyperparameterss for StableRep is presented in Table 3. The computation for StableRep has been converted to SimCLR-equivalent epochs. We follow the hyperparameter setting used in [7] since it is better than that from the original CLIP [8] Table 4 summarizes the training details, and Table 5 presents the architecture of CLIP encoders. With this training setup, we are able to produce 40.2%