Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation 3 Boyang Li4
–Neural Information Processing Systems
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s.
Neural Information Processing Systems
May-25-2025, 15:18:08 GMT
- Country:
- Asia
- China (0.28)
- Middle East > Israel (0.14)
- Europe > Switzerland
- Asia
- Genre:
- Research Report (0.66)
- Technology: