Goto

Collaborating Authors

 meme


The slopaganda era: 10 AI images posted by the White House - and what they teach us

The Guardian

May the 4th be with you The White House celebrates Star Wars Day. May the 4th be with you The White House celebrates Star Wars Day. Under Donald Trump, the White House has filled its social media with memes, wishcasting, nostalgia and deepfakes. Here's what you need to know to navigate the trolling I t started with an image of Trump as a king mocked up on a fake Time magazine cover. Since then it's developed into a full-blown phenomenon, one academics are calling "slopaganda" - an unholy alliance of easily available AI tools and political messaging.


Why Everyone Is Suddenly in a 'Very Chinese Time' in Their Lives

WIRED

Why Everyone Is Suddenly in a'Very Chinese Time' in Their Lives It's a symbol of what Americans believe their own country has lost. In case you didn't get the memo, everyone is feeling very Chinese these days. Across social media, people are proclaiming that "You met me at a very Chinese time of my life," while performing stereotypically Chinese-coded activities like eating dim sum or wearing the viral Adidas Chinese jacket . The trend blew up so much in recent weeks that celebrities like comedian Jimmy O Yang and influencer Hasan Piker even got in on it. It has now evolved into variations like " Chinamaxxing " (acting increasingly more Chinese) and " u will turn Chinese tomorrow " (a kind of affirmation or blessing).


CAuSE: Decoding Multimodal Classifiers using Faithful Natural Language Explanation

Bandyopadhyay, Dibyanayan, Bhattacharjee, Soham, Hasanuzzaman, Mohammed, Ekbal, Asif

arXiv.org Artificial Intelligence

Multimodal classifiers function as opaque black box models. While several techniques exist to interpret their predictions, very few of them are as intuitive and accessible as natural language explanations (NLEs). To build trust, such explanations must faithfully capture the classifier's internal decision making behavior, a property known as faithfulness. In this paper, we propose CAuSE (Causal Abstraction under Simulated Explanations), a novel framework to generate faithful NLEs for any pretrained multimodal classifier. We demonstrate that CAuSE generalizes across datasets and models through extensive empirical evaluations. Theoretically, we show that CAuSE, trained via interchange intervention, forms a causal abstraction of the underlying classifier. We further validate this through a redesigned metric for measuring causal faithfulness in multimodal settings. CAuSE surpasses other methods on this metric, with qualitative analysis reinforcing its advantages. We perform detailed error analysis to pinpoint the failure cases of CAuSE. For replicability, we make the codes available at https://github.com/newcodevelop/CAuSE


ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection

Mei, Jingbiao, Sun, Mingsheng, Chen, Jinghong, Qin, Pengda, Li, Yuhong, Chen, Da, Byrne, Bill

arXiv.org Artificial Intelligence

Hateful memes have emerged as a particularly challenging form of online abuse, motivating the development of automated detection systems. Most prior approaches rely on direct detection, producing only binary predictions. Such models fail to provide the context and explanations that real-world moderation requires. Recent Explain-then-Detect approaches, using Chain-of-Thought prompting or LMM agents, perform worse than simple SFT baselines, and even advanced post-training methods such as GRPO fail to close the gap. Our analysis identifies two key issues of such systems: important policy-relevant cues such as targets and attack types are not hypothesized by the model as a likely explanation; and the binary reward signal is insufficient to guide reasoning. To address these challenges, we propose ExPO-HM (Explain-then-Detect Policy Optimization for Hateful Memes), inspired by the training and evaluation process of human annotators. ExPO-HM combines SFT warmup, GRPO with curriculum learning, and Conditional Decision Entropy (CDE) as both metric and reward for reasoning quality. Across three hateful meme benchmarks, ExPO-HM achieves state-of-the-art performance on binary detection, fine-grained classification, and reasoning quality, with up to 15\% and 17\% F1 improvement over the GRPO and DPO baselines, respectively. By moving hateful meme detection from simple binary alarms to explanation-driven detection, ExPO-HM provides accurate, interpretable, and actionable moderation support.


Enhancing Meme Emotion Understanding with Multi-Level Modality Enhancement and Dual-Stage Modal Fusion

Shi, Yi, Meng, Wenlong, Guo, Zhenyuan, Wei, Chengkun, Chen, Wenzhi

arXiv.org Artificial Intelligence

With the rapid rise of social media and Internet culture, memes have become a popular medium for expressing emotional tendencies. This has sparked growing interest in Meme Emotion Understanding (MEU), which aims to classify the emotional intent behind memes by leveraging their multimodal contents. While existing efforts have achieved promising results, two major challenges remain: (1) a lack of fine-grained multimodal fusion strategies, and (2) insufficient mining of memes' implicit meanings and background knowledge. To address these challenges, we propose MemoDetector, a novel framework for advancing MEU. First, we introduce a four-step textual enhancement module that utilizes the rich knowledge and reasoning capabilities of Multimodal Large Language Models (MLLMs) to progressively infer and extract implicit and contextual insights from memes. These enhanced texts significantly enrich the original meme contents and provide valuable guidance for downstream classification. Next, we design a dual-stage modal fusion strategy: the first stage performs shallow fusion on raw meme image and text, while the second stage deeply integrates the enhanced visual and textual features. This hierarchical fusion enables the model to better capture nuanced cross-modal emotional cues. Experiments on two datasets, MET-MEME and MOOD, demonstrate that our method consistently outperforms state-of-the-art baselines. Specifically, MemoDetector improves F1 scores by 4.3\% on MET-MEME and 3.4\% on MOOD. Further ablation studies and in-depth analyses validate the effectiveness and robustness of our approach, highlighting its strong potential for advancing MEU. Our code is available at https://github.com/singing-cat/MemoDetector.





TRACE: Textual Relevance Augmentation and Contextual Encoding for Multimodal Hate Detection

Koushik, Girish A., Treharne, Helen, Joshi, Aditya, Kanojia, Diptesh

arXiv.org Artificial Intelligence

Social media memes are a challenging domain for hate detection because they intertwine visual and textual cues into culturally nuanced messages. To tackle these challenges, we introduce TRACE, a hierarchical multimodal framework that leverages visually grounded context augmentation, along with a novel caption-scoring network to emphasize hate-relevant content, and parameter-efficient fine-tuning of CLIP's text encoder. Our experiments demonstrate that selectively fine-tuning deeper text encoder layers significantly enhances performance compared to simpler projection-layer fine-tuning methods. Specifically, our framework achieves state-of-the-art accuracy (0.807) and F1-score (0.806) on the widely-used Hateful Memes dataset, matching the performance of considerably larger models while maintaining efficiency. Moreover, it achieves superior generalization on the MultiOFF offensive meme dataset (F1-score 0.673), highlighting robustness across meme categories. Additional analyses confirm that robust visual grounding and nuanced text representations significantly reduce errors caused by benign confounders. We publicly release our code to facilitate future research.


Leveraging LLMs for Context-Aware Implicit Textual and Multimodal Hate Speech Detection

Brook, Joshua Wolfe, Markov, Ilia

arXiv.org Artificial Intelligence

This research introduces a novel approach to textual and multimodal Hate Speech Detection (HSD), using Large Language Models (LLMs) as dynamic knowledge bases to generate background context and incorporate it into the input of HSD classifiers. Two context generation strategies are examined: one focused on named entities and the other on full-text prompting. Four methods of incorporating context into the classifier input are compared: text concatenation, embedding concatenation, a hierarchical transformer-based fusion, and LLM-driven text enhancement. Experiments are conducted on the textual Latent Hatred dataset of implicit hate speech and applied in a multimodal setting on the MAMI dataset of misogynous memes. Results suggest that both the contextual information and the method by which it is incorporated are key, with gains of up to 3 and 6 F1 points on textual and multimodal setups respectively, from a zero-context baseline to the highest-performing system, based on embedding concatenation.