ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers