Mitigating the Bias of Large Language Model Evaluation