Segment Everything Everywhere All at Once

Neural Information Processing Systems 

In this work, we present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image, as shown in Figure 1. In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs). More specifically, SEEM is designed with four desiderata: i) Versatility. We introduce a new visual prompt to unify different spatial queries including points, boxes, scribbles and masks, which can further generalize to a different referring image; ii) Compositionality. We learn a joint visual-semantic space between text and visual prompts, which facilitates the dynamic composition of two prompt types required for various segmentation tasks; iii) Interactivity.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found