FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning

Cuong, Van Duc, Tam, Ta Dinh, Chinh, Tran Duc, Hanh, Nguyen Thi

Aug-18-2025–arXiv.org Artificial Intelligence

Multimodal classification requires robust integration of visual and textual signals, yet common fusion strategies are brittle and vulnerable to modality-specific noise. In this paper, we present FLUID - Flow-Latent Unified Integration via Token Distillation for Expert Specialization, a principled token-level pipeline that improves cross-modal robustness and scalability. FLUID contributes three core elements: (1) Q-transforms, learnable query tokens that distill and retain salient token-level features from modality-specific backbones; (2) a two-stage fusion scheme that enforces cross-modal consistency via contrastive alignment and then performs adaptive, task-aware fusion through a gating mechanism and a Q-bottleneck that selectively compresses information for downstream reasoning; and (3) a lightweight, load-balanced Mixture-of-Experts at prediction time that enables efficient specialization to diverse semantic patterns. Extensive experiments demonstrate that FLUID attains 91% accuracy on the GLAMI-1M benchmark, significantly outperforming prior baselines and exhibiting strong resilience to label noise, long-tail class imbalance, and semantic heterogeneity. Targeted ablation studies corroborate both the individual and synergistic benefits of the proposed components, positioning FLUID as a scalable, noise-resilient solution for multimodal product classification.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Aug-18-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Vietnam (0.15)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found