Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!