ABC: Achieving Better Control of Multimodal Embeddings using VLMs