Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

Open in new window