MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling