SupplementaryMaterial: ImprovingTransferabilityofRepresentations viaAugmentation-AwareSelf-Supervision ATrade-offbetweenaugmentationinvarianceandawareness

Neural Information Processing Systems 

Tosupportthis, we compute the cosine similarity between representations from augmented and original samples, i.e., CS = Ex D,t T[sim(g f(t(x)),g f(x))]. For linear evaluation benchmarks, we randomly choose validation samples in the training split for each dataset when the validation split is not officially provided. Note that the pretraining setups are the same as they officiallyusedforImageNet pretraining described in[2,5,30]. When incorporating our AugSelf into the methods, we use λ=1.0andAAugSelf ={crop,color},unlessotherwisestated. Other hyperparameters are the same as the ImageNet100 setup describedinSectionF.1.