An Inverse Scaling Law for CLIP Training

Neural Information Processing Systems 

The impact of CLIP has been profound, not only in significantly advancing models' zero/few-shot capabilities and out-of-distribution generalization [