Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning