Hateful Memes Challenge: An Enhanced Multimodal Framework

Gao, Aijing, Wang, Bingjun, Yin, Jiaqi, Tian, Yating

arXiv.org Artificial Intelligence 

Hateful Meme Challenge proposed by Facebook AI has attracted contestants around the world. The challenge focuses on detecting hateful speech in multimodal memes. Various state-of-the-art deep learning models have been applied to this problem and the performance on challenge's Figure 1: Examples of Hateful Memes [3] leaderboard has also been constantly improved. In this paper, we enhance the hateful detection framework, including utilizing Detectron for feature extraction, exploring different the gap between the best model and human is still large. A setups of VisualBERT and UNITER models with different recent research comparing models of hateful speech detection loss functions, researching the association between the in multimodal memes and human shows an accuracy hateful memes and the sensitive text features, and finally of 64.73% and 84.7%[12], leaving much room for further building ensemble method to boost model performance.