VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts

Open in new window