Segment Everything Everywhere All at Once

Mar-21-2025, 17:47:50 GMT–Neural Information Processing Systems

In this work, we present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image, as shown in Figure 1. In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs). More specifically, SEEM is designed with four desiderata: i) Versatility. We introduce a new visual prompt to unify different spatial queries including points, boxes, scribbles and masks, which can further generalize to a different referring image; ii) Compositionality. We learn a joint visual-semantic space between text and visual prompts, which facilitates the dynamic composition of two prompt types required for various segmentation tasks; iii) Interactivity.

large language model, machine learning, segmentation, (17 more...)

Neural Information Processing Systems

Mar-21-2025, 17:47:50 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East
  - Israel (0.14)
- North America > United States
  - Wisconsin (0.14)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)
  - Natural Language > Large Language Model (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
Segment Everything Everywhere All at Once

Similar Docs Excel Report more

Title	Similarity	Source
None found