CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception