A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation

Open in new window