AdaptingSelf-SupervisedVisionTransformersby ProbingAttention-ConditionedMaskingConsistency

Neural Information Processing Systems 

Similarly, self-supervised representation learning (SSL) is rapidly replacing supervised learning as the de-facto pretraining strategy for deep networks, due to improved scalability (unlabeled data is easier to collect) and generality (domain-specific SSL is often preferable to one-fits-all ImageNet pretraining [16,17]).