Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation 3 Boyang Li
–Neural Information Processing Systems
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s.
Neural Information Processing Systems
Feb-11-2025, 15:11:02 GMT
- Genre:
- Research Report (0.66)