MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging

Kong, Shufeng, Wang, Zijie, Cui, Nuan, Tang, Hao, Meng, Yihan, Wei, Yuanyuan, Chen, Feifan, Wang, Yingheng, Cai, Zhuo, Wang, Yaonan, Zhang, Yulong, Li, Yuzheng, Zheng, Zibin, Liu, Caihua, Liang, Hao

arXiv.org Artificial Intelligence 

We introduce MIRNet (Medical Image Reasoner Network), a novel framework that integrates self-supervised pre-training with constrained graph-based reasoning. Tongue image diagnosis is a particularly challenging domain that requires fine-grained visual and semantic understanding. Our approach leverages self-supervised masked autoencoder (MAE) to learn transferable visual representations from unlabeled data; employs graph attention networks (GA T) to model label correlations through expert-defined structured graphs; enforces clinical priors via constraint-aware optimization using KL divergence and regularization losses; and mitigates imbalance using asymmetric loss (ASL) and boosting ensembles. To address annotation scarcity, we also introduce TongueAtlas-4K, a comprehensive expert-curated benchmark comprising 4,000 images annotated with 22 diagnostic labels-representing the largest public dataset in tongue analysis. V alidation shows our method achieves state-of-the-art performance.