Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection

Hossain, Md. Mithun, Hossain, Md. Shakil, Chaki, Sudipto, Mridha, M. F.

Jul-31-2025–arXiv.org Artificial Intelligence

Multi-modal learning has emerged as a crucial research direction, as integrating textual and visual information can substantially enhance performance in tasks such as classification, retrieval, and scene understanding. Despite advances with large pre-trained models, existing approaches often suffer from insufficient cross-modal interactions and rigid fusion strategies, failing to fully harness the complementary strengths of different modalities. To address these limitations, we propose Co-AttenDWG, co-attention with dimension-wise gating, and expert fusion. Our approach first projects textual and visual features into a shared embedding space, where a dedicated co-attention mechanism enables simultaneous, fine-grained interactions between modalities. This is further strengthened by a dimension-wise gating network, which adaptively modulates feature contributions at the channel level to emphasize salient information. In parallel, dual-path encoders independently refine modality-specific representations, while an additional cross-attention layer aligns the modalities further. The resulting features are aggregated via an expert fusion module that integrates learned gating and self-attention, yielding a robust unified representation. Experimental results on the MIMIC and SemEval Memotion 1.0 datasets show that Co-AttenDWG achieves state-of-the-art performance and superior cross-modal alignment, highlighting its effectiveness for diverse multi-modal applications.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-31-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Bangladesh (0.14)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology
  - Communications > Social Media (0.94)
  - Data Science > Data Mining (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language (1.00)
    - Representation & Reasoning > Information Fusion (0.68)
    - Machine Learning
      - Neural Networks > Deep Learning (0.68)
      - Performance Analysis > Accuracy (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found