CLIP-S$^4$: Language-Guided Self-Supervised Semantic Segmentation