Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models

Open in new window