Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models