Video Representation Learning with Joint-Embedding Predictive Architectures

Open in new window