Universal Multimodal Representation for Language Understanding