Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels Heeseong Shin 1 Chaehyun Kim

Neural Information Processing Systems 

Large-scale vision-language models like CLIP have demonstrated impressive open-vocabulary capabilities for image-level tasks, excelling in recognizing what objects are present.