MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation

Chen, Huangwei, Chen, Yifei, Yan, Zhenyu, Ding, Mingyang, Li, Chenlei, Zhu, Zhu, Qin, Feiwei

Mar-19-2025–arXiv.org Artificial Intelligence

Neuroblastoma (NB), a leading cause of childhood cancer mortality, exhibits significant histopathological variability, necessitating precise subtyping for accurate prognosis and treatment. Traditional diagnostic methods rely on subjective evaluations that are time-consuming and inconsistent. To address these challenges, we introduce MMLNB, a multi-modal learning (MML) model that integrates pathological images with generated textual descriptions to improve classification accuracy and interpretability. The approach follows a two-stage process. First, we fine-tune a Vision-Language Model (VLM) to enhance pathology-aware text generation. Second, the fine-tuned VLM generates textual descriptions, using a dual-branch architecture to independently extract visual and textual features. These features are fused via Progressive Robust Multi-Modal Fusion (PRMF) Block for stable training. Experimental results show that the MMLNB model is more accurate than the single modal model. Ablation studies demonstrate the importance of multi-modal fusion, fine-tuning, and the PRMF mechanism. This research creates a scalable AI-driven framework for digital pathology, enhancing reliability and interpretability in NB subtyping classification. Our source code is available at https://github.com/HovChen/MMLNB.

classification, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-19-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China > Zhejiang Province (0.29)

Genre:
- Research Report
  - Experimental Study (0.68)
  - New Finding (0.88)

Industry:
- Health & Medicine > Therapeutic Area > Oncology > Childhood Cancer (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks > Deep Learning (1.00)
      - Performance Analysis > Accuracy (0.66)
    - Natural Language (1.00)
    - Representation & Reasoning (0.93)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)