CMI-MTL: Cross-Mamba interaction based multi-task learning for medical visual question answering
Jin, Qiangguo, Zheng, Xianyao, Cui, Hui, Sun, Changming, Fang, Yuqi, Cong, Cong, Su, Ran, Wei, Leyi, Xuan, Ping, Wang, Junbo
–arXiv.org Artificial Intelligence
Medical visual question answering (Med-VQA) is a crucial multimodal task in clinical decision support and telemedicine. Recent self-attention based methods struggle to effectively handle cross-modal semantic alignments between vision and language. Moreover, classification-based methods rely on predefined answer sets. Treating this task as a simple classification problem may make it unable to adapt to the diversity of free-form answers and overlook the detailed semantic information of free-form answers. In order to tackle these challenges, we introduce a Cross-Mamba Interaction based Multi-Task Learning (CMI-MTL) framework that learns cross-modal feature representations from images and texts. CMI-MTL comprises three key modules: fine-grained visual-text feature alignment (FVTA), cross-modal interleaved feature representation (CIFR), and free-form answer-enhanced multi-task learning (FFAE). FVTA extracts the most relevant regions in image-text pairs through fine-grained visual-text feature alignment. CIFR captures cross-modal sequential interactions via cross-modal interleaved feature representation. FFAE leverages auxiliary knowledge from open-ended questions through free-form answer-enhanced multi-task learning, improving the model's capability for open-ended Med-VQA. Experimental results show that CMI-MTL outperforms the existing state-of-the-art methods on three Med-VQA datasets: VQA-RAD, SLAKE, and OVQA. Furthermore, we conduct more interpretability experiments to prove the effectiveness. The code is publicly available at https://github.com/BioMedIA-repo/CMI-MTL.
arXiv.org Artificial Intelligence
Nov-4-2025
- Country:
- Asia
- China
- Guangdong Province > Shantou (0.04)
- Jiangsu Province > Nanjing (0.04)
- Tianjin Province > Tianjin (0.04)
- Macao (0.04)
- China
- Europe > Spain
- Andalusia > Granada Province > Granada (0.04)
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Victoria > Melbourne (0.04)
- Asia
- Genre:
- Research Report
- New Finding (0.48)
- Promising Solution (0.34)
- Research Report
- Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Technology: