Video models are zero-shot learners and reasoners