Towards a Robust Framework for Multimodal Hate Detection: A Study on Video vs. Image-based Content