DHP Benchmark: Are LLMs Good NLG Evaluators?