LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models