What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning