Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

Open in new window