ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model

Han, Kunyang, Hu, Yibo, Qu, Mengxue, Shi, Hailin, Zhao, Yao, Wei, Yunchao

Dec-4-2024–arXiv.org Artificial Intelligence

Advances in CLIP and large multimodal models (LMMs) have enabled open-vocabulary and free-text segmentation, yet existing models still require predefined category prompts, limiting free-form category self-generation. Most segmentation LMMs also remain confined to sparse predictions, restricting their applicability in open-set environments. In contrast, we propose ROSE, a Revolutionary Open-set dense SEgmentation LMM, which enables dense mask prediction and open-category generation through patch-wise perception. Our method treats each image patch as an independent region of interest candidate, enabling the model to predict both dense and sparse masks simultaneously. Additionally, a newly designed instruction-response paradigm takes full advantage of the generation and generalization capabilities of LMMs, achieving category prediction independent of closed-set constraints or predefined categories. To further enhance mask detail and category precision, we introduce a conversation-based refinement paradigm, integrating the prediction result from previous step with textual prompt for revision. Extensive experiments demonstrate that ROSE achieves competitive performance across various segmentation tasks in a unified framework. Code will be released.

category, prediction, segmentation, (14 more...)

arXiv.org Artificial Intelligence

Dec-4-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (0.93)
    - Machine Learning > Neural Networks
      - Deep Learning (0.94)