MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model