Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning