Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process