Rethinking Intermediate Representation for VLM-based Robot Manipulation

Open in new window