Rethinking Intermediate Representation for VLM-based Robot Manipulation