Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs