Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation