A Step Towards Mixture of Grader: Statistical Analysis of Existing Automatic Evaluation Metrics