Cross-Linked Unified Embedding for cross-modality representation learning