Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Open in new window