Self-supervised video pretraining yields robust and more human-aligned visual representations

Open in new window