MoV A: Adapting Mixture of Vision Experts to Multimodal Context

Neural Information Processing Systems 

We conduct extensive experiments to evaluate the effectiveness of the proposed approach. Without any bells and whistles, MoV A can achieve significant performance gains over current state-of-the-art methods in a wide range of challenging multimodal benchmarks.