MoMo: A shared encoder Model for text, image and multi-Modal representations