Vision Language Models See What You Want but not What You See

Open in new window