Texture or Semantics? Vision-Language Models Get Lost in Font Recognition

Open in new window