MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning

Chen, Siyong, Wen, Jinbo, Kang, Jiawen, Huang, Tenghui, Huang, Xumin, Su, Yuanjia, Pan, Hudan, Zhong, Zishao, Niyato, Dusit, Xie, Shengli, Kim, Dong In

Oct-27-2025–arXiv.org Artificial Intelligence

Abstract--Recently, large models have shown significant potential for smart healthcare. However, the deployment of Large Vision-Language Models (L VLMs) for clinical services is currently hindered by three critical challenges: a tendency to hallucinate answers not grounded in visual evidence, the inefficiency of fixed-depth reasoning, and the difficulty of multi-institutional collaboration. T o address these challenges, in this paper, we develop MedAlign, a novel framework to ensure visually accurate L VLM responses for Medical Visual Question Answering (Med-VQA). Specifically, we first propose a mul-timodal Direct Preference Optimization (mDPO) objective to explicitly align preference learning with visual context. T o achieve adaptive reasoning and facilitate multi-institutional collaboration, we propose a federated governance mechanism, where the selected expert, fine-tuned on clinical datasets based on mDPO, locally performs iterative Chain-of-Thought (CoT) reasoning via the local meta-cognitive uncertainty estimator . Extensive experiments on three representative Med-VQA datasets demonstrate that MedAlign achieves state-of-the-art performance, outperforming strong retrieval-augmented baselines by up to 11.85% in F1-score, and simultaneously reducing the average reasoning length by 51.60% compared with fixed-depth CoT approaches. Su, and S. Xie are with the School of Automation, Guangdong University of Technology, Guangzhou, China (e-mails: 3122000875@mail2.gdut.edu.cn, J. Wen is with the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China (e-mail: jinbo1608@nuaa.edu.cn). H. Pan and Z. Zhong are with State Key Laboratory of Traditional Chinese Medicine Syndrome, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangdong Provincial Academy of Chinese Medical Sciences, Guangzhou, China, and Chinese Medicine Guangdong Laboratory, Zhuhai, China (e-mails: hdpan@gzucm.edu.cn,

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-27-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Guangdong Province > Guangzhou (0.64)
  - Jiangsu Province > Nanjing (0.44)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.70)
  - Cognitive Science > Problem Solving (0.68)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found