Towards Compatible Fine-tuning for Vision-Language Model Updates