OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts

Jun-11-2026, 12:34:40 GMT–Neural Information Processing Systems

The ability to segment objects based on open-ended language prompts remains a critical challenge, requiring models to ground textual semantics into precise spatial masks while handling diverse and unseen categories. We present OpenWorldSAM, a framework that extends the prompt-driven Segment Anything Model v2 (SAM2) to open-vocabulary scenarios by integrating multi-modal embeddings extracted from a lightweight vision-language model (VLM). Our approach is guided by four key principles: i) Unified prompting: OpenWorldSAM supports a diverse range of prompts, including category-level and sentence-level language descriptions, providing a flexible interface for various segmentation tasks.

artificial intelligence, natural language, proceedings, (5 more...)

Neural Information Processing Systems

Jun-11-2026, 12:34:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language (0.77)