Response Wide Shut? Surprising Observations in Basic Vision Language Model Capabilities

Open in new window