Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction