Homology Consistency Constrained Efficient Tuning for Vision-Language Models

Neural Information Processing Systems 

Efficient transfer learning has shown remarkable performance in tuning large-scale vision-language models (VLMs) toward downstream tasks with limited data resources. The key challenge of efficient transfer lies in adjusting image-text alignment to be task-specific while preserving pre-trained general knowledge.