MixPrompt: Efficient Mixed Prompting for Multimodal Semantic Segmentation
–Neural Information Processing Systems
Recent advances in multimodal semantic segmentation show that incorporating auxiliary inputs--such as depth or thermal images--can significantly improve performance over single-modality (RGB-only) approaches. However, most existing solutions rely on parallel backbone networks and complex fusion modules, greatly increasing model size and computational demands. Inspired by prompt tuning in large language models, we introduce MixPrompt: a prompting-based framework that integrates auxiliary modalities into a pretrained RGB segmentation model without modifying its architecture. MixPrompt uses a lightweight prompting module to extract and fuse information from auxiliary inputs into the main RGB backbone. This module is initialized using the early layers of a pretrained RGB feature extractor, ensuring a strong starting point.
Neural Information Processing Systems
Jun-16-2026, 03:07:54 GMT
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Technology:
- Information Technology
- Data Science (0.93)
- Sensing and Signal Processing > Image Processing (0.93)
- Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Robots (0.93)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks > Deep Learning (0.46)
- Information Technology