ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers

Open in new window