MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning