Challenges of Zero-Shot Recognition with Vision-Language Models: Granularity and Correctness

Open in new window