MMRL: Multi-Modal Representation Learning for Vision-Language Models