UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations