Do CLIPs Always Generalize Better than ImageNet Models?