Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP