VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts