Learning Multimodal Latent Space with EBM Prior and MCMC Inference