MoV A: Adapting Mixture of Vision Experts to Multimodal Context