Fine-tuned CLIP Models are Efficient Video Learners

Open in new window