Achieving Cross Modal Generalization with Multimodal Unified Representation Y an Xia 1 Hai Huang