Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models