NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist