CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models, Dong Gong 1

Neural Information Processing Systems 

Continual learning (CL) aims to help deep neural networks learn new knowledge while retaining what has been learned. Owing to their powerful generalizability, pretrained vision-language models such as Contrastive Language-Image Pre-training (CLIP) [1] have lately gained traction as practical CL candidates. However, the domain mismatch between the pre-training and the downstream CL tasks often calls for finetuning of the CLIP on the latter.