Video models are zero-shot learners and reasoners

Open in new window