Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation 3 Boyang Li4

Neural Information Processing Systems 

This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found