Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization