Rethinking and Benchmarking Large Language Models for Graph Reasoning