Segment Everything Everywhere All at Once
–Neural Information Processing Systems
In this work, we present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image, as shown in Figure 1. In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs). More specifically, SEEM is designed with four desiderata: i) Versatility. We introduce a new visual prompt to unify different spatial queries including points, boxes, scribbles and masks, which can further generalize to a different referring image; ii) Compositionality. We learn a joint visual-semantic space between text and visual prompts, which facilitates the dynamic composition of two prompt types required for various segmentation tasks; iii) Interactivity.
Neural Information Processing Systems
Mar-21-2025, 17:47:50 GMT
- Country:
- Asia > Middle East
- Israel (0.14)
- North America > United States
- Wisconsin (0.14)
- Asia > Middle East
- Industry:
- Education (0.46)
- Technology: