UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Region Profiling