Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment

Open in new window