Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?

Open in new window