Multimodal Feature Fusion Network with Text Difference Enhancement for Remote Sensing Change Detection

Zhou, Yijun, Zhai, Yikui, Ying, Zilu, Xian, Tingfeng, Zhou, Wenlve, Zhou, Zhiheng, Tian, Xiaolin, Jia, Xudong, Zhang, Hongsheng, Chen, C. L. Philip

arXiv.org Artificial Intelligence 

--Although deep learning has advanced remote sensing change detection (RSCD), most methods rely solely on image modality, limiting feature representation, change pattern modeling, and generalization--especially under illumination and noise disturbances. T o address this, we propose MMChange, a multimodal RSCD method that combines image and text modalities to enhance accuracy and robustness. An Image Feature Refinement (IFR) module is introduced to highlight key regions and suppress environmental noise. T o overcome the semantic limitations of image features, we employ a vision-language model (VLM) to generate semantic descriptions of bi-temporal images. T o bridge the heterogeneity between modalities, we design an Image-T ext Feature Fusion (ITFF) module that enables deep cross-modal integration. Extensive experiments on LEVIR-CD, WHU-CD, and SYSU-CD demonstrate that MMChange consistently surpasses state-of-the-art methods across multiple metrics, validating its effectiveness for multimodal RSCD. Yijun Zhou, Yikui Zhai, Zilu Ying and Tingfeng Xian are with the College of Electronics and Information Engineering, Wuyi University, Jiang-men, 529020, China(e-mail: 17346700814@163.com, Wenlve Zhou, Zhiheng Zhou are with the School of Electronic and Information Engineering and the Key Laboratory of Big Data and Intelligent Robot, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510641, China (e-mail: wenlvezhou@163.com; Xiaolin Tian are with the State Key Laboratory of Lunar and Planetary Sciences, Macau University of Science and Technology, Taipa, Macau (email:xltian@must.edu.mo). Xudong Jia is the College of Engineering and Computer Science, California State University, Northridge, 18111, America (e-mail: Xudong.Jia@csun.edu). Hongsheng Zhang is with the Department of Geography, The University of Hong Kong, Hong Kong, China (e-mail: zhanghs@hku.hk). C. L. Philip Chen is with the Faculty of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China (e-mail: philip.chen@ieee.org).