Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios