Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection
Hossain, Md. Mithun, Hossain, Md. Shakil, Chaki, Sudipto, Mridha, M. F.
–arXiv.org Artificial Intelligence
Multi-modal learning has emerged as a crucial research direction, as integrating textual and visual information can substantially enhance performance in tasks such as classification, retrieval, and scene understanding. Despite advances with large pre-trained models, existing approaches often suffer from insufficient cross-modal interactions and rigid fusion strategies, failing to fully harness the complementary strengths of different modalities. To address these limitations, we propose Co-AttenDWG, co-attention with dimension-wise gating, and expert fusion. Our approach first projects textual and visual features into a shared embedding space, where a dedicated co-attention mechanism enables simultaneous, fine-grained interactions between modalities. This is further strengthened by a dimension-wise gating network, which adaptively modulates feature contributions at the channel level to emphasize salient information. In parallel, dual-path encoders independently refine modality-specific representations, while an additional cross-attention layer aligns the modalities further. The resulting features are aggregated via an expert fusion module that integrates learned gating and self-attention, yielding a robust unified representation. Experimental results on the MIMIC and SemEval Memotion 1.0 datasets show that Co-AttenDWG achieves state-of-the-art performance and superior cross-modal alignment, highlighting its effectiveness for diverse multi-modal applications.
arXiv.org Artificial Intelligence
Jul-31-2025
- Country:
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- Genre:
- Research Report > New Finding (0.93)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (0.68)
- Performance Analysis > Accuracy (0.47)
- Natural Language (1.00)
- Representation & Reasoning > Information Fusion (0.68)
- Vision (1.00)
- Machine Learning
- Communications > Social Media (0.94)
- Data Science > Data Mining (0.93)
- Artificial Intelligence
- Information Technology