MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

Open in new window