FILS: Self-Supervised Video Feature Prediction In Semantic Language Space

Open in new window