Vision language models are blind

Open in new window