Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Open in new window