Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Jan-20-2025, 01:29:12 GMT–Neural Information Processing Systems

This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s. We argue that this discrepancy arises from the lack of elaborate supervision for each group token. To bridge this granularity gap, this paper explores explicit supervision for the group tokens from the prototypical knowledge.

group token, uncovering prototypical knowledge, weakly open-vocabulary semantic segmentation, (3 more...)

Neural Information Processing Systems

Jan-20-2025, 01:29:12 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.41)

Technology:
- Information Technology > Artificial Intelligence (0.41)