Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines