MM-HSD: Multi-Modal Hate Speech Detection in Videos
Céspedes-Sarrias, Berta, Collado-Capell, Carlos, Rodenas-Ruiz, Pablo, Hrynenko, Olena, Cavallaro, Andrea
–arXiv.org Artificial Intelligence
While hate speech detection (HSD) has been extensively studied in text, existing multi-modal approaches remain limited, particularly in videos. As modalities are not always individually informative, simple fusion methods fail to fully capture inter-modal dependencies. Moreover, previous work often omits relevant modalities such as on-screen text and audio, which may contain subtle hateful content and thus provide essential cues, both individually and in combination with others. In this paper, we present MM-HSD, a multi-modal model for HSD in videos that integrates video frames, audio, and text derived from speech transcripts and from frames (i.e.~on-screen text) together with features extracted by Cross-Modal Attention (CMA). We are the first to use CMA as an early feature extractor for HSD in videos, to systematically compare query/key configurations, and to evaluate the interactions between different modalities in the CMA block. Our approach leads to improved performance when on-screen text is used as a query and the rest of the modalities serve as a key. Experiments on the HateMM dataset show that MM-HSD outperforms state-of-the-art methods on M-F1 score (0.874), using concatenation of transcript, audio, video, on-screen text, and CMA for feature extraction on raw embeddings of the modalities. The code is available at https://github.com/idiap/mm-hsd
arXiv.org Artificial Intelligence
Aug-29-2025
- Country:
- Africa (0.04)
- Asia
- Japan > Honshū
- Kansai > Kyoto Prefecture > Kyoto (0.04)
- Middle East > Israel
- Tel Aviv District > Tel Aviv (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Japan > Honshū
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France
- Ireland (0.04)
- Italy > Tuscany
- Florence (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- United Kingdom > England
- West Midlands > Birmingham (0.04)
- Belgium > Brussels-Capital Region
- North America > United States
- Connecticut (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- New York > New York County
- New York City (0.04)
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Victoria > Melbourne (0.04)
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Information Technology (0.93)
- Law Enforcement & Public Safety > Terrorism (0.67)
- Leisure & Entertainment (1.00)
- Media (0.97)
- Technology: