MixPrompt: Efficient Mixed Prompting for Multimodal Semantic Segmentation

Jun-16-2026, 03:07:54 GMT–Neural Information Processing Systems

Recent advances in multimodal semantic segmentation show that incorporating auxiliary inputs--such as depth or thermal images--can significantly improve performance over single-modality (RGB-only) approaches. However, most existing solutions rely on parallel backbone networks and complex fusion modules, greatly increasing model size and computational demands. Inspired by prompt tuning in large language models, we introduce MixPrompt: a prompting-based framework that integrates auxiliary modalities into a pretrained RGB segmentation model without modifying its architecture. MixPrompt uses a lightweight prompting module to extract and fuse information from auxiliary inputs into the main RGB backbone. This module is initialized using the early layers of a pretrained RGB feature extractor, ensuring a strong starting point.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-16-2026, 03:07:54 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.67)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Technology:
- Information Technology
  - Data Science (0.93)
  - Sensing and Signal Processing > Image Processing (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Robots (0.93)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found