Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication