Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

Open in new window