MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling

Open in new window