Pretraining Frequency Predicts Compositional Generalization of CLIP on Real-World Tasks

Open in new window