Do Vision-Language Models Really Understand Visual Language?

Open in new window