Caption Enriched Samples for Improving Hateful Memes Detection
Blaier, Efrat, Malkiel, Itzik, Wolf, Lior
–arXiv.org Artificial Intelligence
The recently introduced hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not. Specifically, both unimodal language models and multimodal vision-language models cannot reach the human level of performance. Motivated by the need to model the contrast between the image content and the overlayed text, we suggest applying an off-the-shelf image captioning tool in order to capture the first. We demonstrate that the incorporation of such automatic captions during fine-tuning improves the results for various unimodal and multimodal models. Moreover, in the unimodal case, continuing the pre-training of language models on augmented and original caption pairs, is highly beneficial to the classification accuracy.
arXiv.org Artificial Intelligence
Sep-22-2021
- Country:
- Europe > Belgium
- Brussels-Capital Region > Brussels (0.04)
- Asia > Middle East
- Israel > Tel Aviv District > Tel Aviv (0.05)
- Europe > Belgium
- Genre:
- Research Report (1.00)
- Technology: