MOVER: Multimodal Optimal Transport with Volume-based Embedding Regularization