Vision language models are unreliable at trivial spatial cognition

Open in new window