VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following