Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?

Open in new window