M$^2$IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering

Open in new window