Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark