Cross-modal Variational Auto-encoder with Distributed Latent Spaces and Associators