Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP