ViT-VS: On the Applicability of Pretrained Vision Transformer Features for Generalizable Visual Servoing

Open in new window