Do Vision-Language Models Really Understand Visual Language?