Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection

Open in new window