Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Open in new window