Self-supervised video pretraining yields human-aligned visual representations

Open in new window