Efficient Training of Visual Transformers with Small Datasets

Neural Information Processing Systems 

Our task is used jointly with the standard (supervised) training and it does not depend on specific architectural choices, thus it can be easily plugged in the existing VTs.